I’ve been looking at Data Scientist roles, and a little more of what skills one needs for Analytics roles. A frequent requirement is some history or experience of using R or other statistical analysis programmes. I don’t have any history of using these types of programmes. I had used Mathematica in my Physics masters research project back in 2010, but not for a lot of statistical work. I have used python, however, and I knew that O’Reilly sold a book entitled “Python for Data Analysis”. Hopefully I can gain some skills and experience from this book, which I’ll summarise here.
First job was setting up python properly. I tried to install all of the packages required directly to the python found in OS X (I’m running 10.11, El Capitan), but something went very wrong with the installation of
pandas. (Lots of reports of unused functions.) In googling for answers, I found this blog post which explained how to set up and install the relevant packages within virtual environments.
The steps laid out in the blog post were all correct, except I found that I had to paste the following lines into a newly created
.bash_profile and then run the command
source .bash_profile, rather than into an already existing
.bash_rc (that is, there was no
export WORKON_HOME=$HOME/.virtualenvs source /usr/local/bin/virtualenvwrapper.sh
A second difference was finding the version of
pandas. Rather than
pandas.version.version, as in the blog post, I had to use