Some skills are needed throughout the data science process, like knowledge of a computer programming language. The most popular languages are Python and R. In some disciplines, knowledge of both is needed in order to access libraries or packages created for a specific task.
| DATA SCIENCE PROCESS | SKILLS / KNOWLEDGE | 
|---|---|
| 1. Framing the problem | Domain knowledge | 
| 2. Data collection | Database management (My SQL, PostgreSQL, MongoDB) Distributed processing (Apache Hadoop, Spark, Flink) Web scraping and using APIs | 
| 3. Data cleaning | Pandas for Python. R | 
| 4. Exploratory analysis | Statistics Data visualization (libraries in Python: Numpy, Matplotlib, Pandas, Scipy. Packages in R: ggplot2, Dplyr) | 
| 5. Modeling and analysis | Statistical inferenceMachine Learning (scikit-learn for Python) | 
| 6. Interpretation and communication of results | Domain knowledge Data visualization (matplotlib, ggplot, seabron, tableau, d3j) Dashboards (Shine for R, Dash for Python) Sharing and documenting code (Jupyter notebooks, R Markdown, creating R or Python packages) |