Data Science Programming Resources
These days, being a data scientist means being a programmer too. While there are numerous languages out there, and every person has their opinion on which are worth learning, there are two that are most common for data science purposes: Python and R. Also popular are SQL, MATLAB, and Stata. In this section, you’ll find resources for learning these languages, with a mix of beginner’s guides to advanced graduate-level courses. If you are aware of any free resources not listed here, please share them in a comment below or in an email to hellomenti@gmail.com!
Python
General-purpose programming language, useful for data science, web development, and software development.
MIT Intro to Computer Science and Programming in Python course
Collection of Python guides for those coming from R by Joscelin Rocha Hidalgo
Pyslackers - An open community for Python programming enthusiasts.
PyLadies - mentorship group with a focus on helping more women become active participants and leaders in the Python open-source community
R/RStudio
Specialized language which is highly useful for statistical analysis and data visualization. Usually run through the RStudio interface.
STAT 545 - UBC course on R and data science
Intro to R by Hans H. Sievertsen - how to load, process, and visualize data in R
SQL
A language for manipulating structured data, effective with large datasets. One of the most common languages for actually gathering the data needed to perform data science.
MATLAB
A mathematical programming meant for statistical computing, also has tools for data visualization.
Stata
A highly specialized tool best suited for running econometric and regression analysis.
Development Research in Practice (Stata Style Guide) by DIME Analytics
Tips for managing large-scale datasets efficiently in Stata by Pere A. Taberner