top of page

Introduction

In the lab we use/know 3 languages

  1. Matlab

  2. Python

  3. R


You should use the one that you fill most confortable with but keep in mind the following points:

  • If you want to be "serious" about programming you should be comfortable with all of them!

  • MATLAB is propriatery so it's harder to share you work with others, eg they may not have licence.

  • R is most suitable for data analysis and visualization.

  • Python is better for general computing, ie not specific to scientific computing.

  • If you want to use a specific library you should learn the corresponding language.
    For example CPLEX has Python and Matlab interfaces while Gurobi only has a python interface.


Finally, all 3 languages have reporting capabilities.
 


Use them! You should strive to write readable and reproducible code.
When working on a project, no matter how small, write a report or a least a script that can reproduce all of you steps from start to finish.
You will thank me later for this advise!

Finally, nobody I know or heard of learned any language by reading or tutorials.
These things are good to learn the basic and get the feeling but you can only learn through practice so don't overdo it with all the resources here.
Try to start you project as soon as possible and you will get things along the way.

Matlab


Not much to say! It's all in one and the documentation is amazing! Plus you should have taken a course in the first year. Good luck!

Python


If your OS is Linux or OSX then Python is preinstalled. However, make sure that you have Python 3, preferably the latest version (this is written in 2016 so Python 2 is still available).
You can install python from the official website, but if you are new to programming (which is a reasonable assumption if you are reading this...) then you should consider installing Anaconda.
Anaconda is a "scientific" distribution of Python that comes with many libraries pre-loaded and it installs easily in any OS.

At the very least you will need the following libraries/packages (all of them included in the Anaconda distribution):

  1. Numpy - for numerical analysis

  2. Pandas - for data analysis

  3. Matplotlib - for visualization (many other options are available)

  4. Ipython - powerfull shell for interactive use

  5. Jupyter - to write reports

Python Basics


There are many resources. The official list is here.
You can use the datacamp website for a guided tutorial with emphasis on datas science. Ask Leo for premium access!
 

Python Advanced


If you want to learn something specific, a good place to start is here.
To learn more in-depth Python:

  1. A lot of nice video tutorials are provided by

    1. Enthought (SciPy Conference)

    2. Continuum

    3. PyData as well as

    4. PyCons for more general topics. (the link is for 2016 but there are more available)

  2. Books
    For data science I would recommend "Python Data Science Handbook" (not read yet but know/follow the author)

 

For users with Matlab background:

 

Unlike Matlab which comes with everything included, Python has a modular structure so you have to install extra libraries yourself.
That's why Anaconda is a nice start. However, keep in mind that you have to "load" these libraries when working (in Matlab everything is loaded by default).
The numpy library provides all the basic functionality of matlab (matrices etc). A short dictionary between the 2 can be found here.
If you liked Matlab's IDE (GUI), then you should consider using Spyder which follows/copies the same principles.

R


R is a very domain-specific so it's certainly not installed by default.
You can download the official version from CRAN. Alternatively, if you are looking for speed, you can download Microsofts version from MRAN.
I have not used Microsoft's version but I have read that is exactly the same, it just takes advantage of parallel processing capabilities when necessary.
(if you are into "compiling", then google "r mkl" or "r openblas"...)

In any case, you are highly recommended to download RStudio. It's the IDE for R.

R's bioinformatics community also uses bioconductor a lot so you should have it in mind when looking for packages.
Finally, it is advisable to learn the "tidyverse" API when writing simple scripts and visualizations in order to maintain the uniformity of lab's presentations.
 

R basics


A quick overview is given here.
For interactive tutorials with emphasis in data analysis you can try datacamp. Leo can provide you with premium access if you wish!
Alternatively, you can try swirl for a local tutorial.
 

R Advanced


Again to learn something specific, you can look here.
The R-bloggers blog has very nice tutorials and it's worth following (via newsletter/twitter/RSS)
To learn more in-depth R, you can read one of the many free book (partial list here).
I would recommend the following books:

  1. R for data science - intermediate and explains the tidyverse very well so highly recommended

  2. An introduction to statistical learning with R - Theory of Machine Learning + applications

  3. Advanced R - very in-depth

 

For users with Matlab background:

Unlike Matlab which comes with everything included, R has a modular structure so you have to install extra libraries yourself.
A dictionary between the two languages can be found here and here.
With RStudio you should feel at home with respect to the IDE.

General Purpose Resources

 

bottom of page