It has been long time since I saw for the first time the Data Science Venn Diagram. The article originated in 2010.
At that time, Data Science was considered the intersection of Domain Expertise, Math&Stats Knowledge and Hacking Skills. I agreed on the view. In fact, most of the big data tools were just released. You should have been a hacker to deal with extensive scatter plots, Python code to train models in (at that time famous) Orange, export your models and implement them in Java code for map-reduce jobs. Don’t forget that being an advanced Linux user was really useful for scheduling your jobs before others in the queue! At that time, the majority of the people doing Data Science were academic researchers, i.e., people with domain expertise and knowledge of math&stats.
Things have changed. The Data Science development stack is quite established nowadays and Python and Apache Spark are consolidated tools. Moreover, we have understood what Data Science entails and we have a pretty clear idea of the questions that Data Science can answer (both in research and business).
That’s why the Data Science Venn Diagram needs an update. The new diagram should look like this:
There is still a need of substantial Domain Expertise. In fact, you should know “your stuff” before providing the insights that will change your business. When Domain Expertise and Computer Science joins, you are a Software Developer. You are indeed capable to solve real-world problems by using algorithms coded in a computer. One of the greatest skills you need here is creativity.
When Domain Expertise joins Math&Stats, you are a Data Analyst. In fact, you are capable to understand why things happens by look at dashboards. There’s still a danger zone here: ‘Correlation doesn’t imply Causation”. I’m confident that the majority of data analysts are aware of this!
There’s the upper part: Computer Science joins Math&Stats. This is what in the old diagram was named “Traditional Research”. In fact, computer scientists need to keep understanding ways of creating new algorithms and improving existing ones in order to deal with an increasing amount of data. Computer scientists need to understand the limitations of these algorithms and guarantee that the algorithms are doing the correct thing. Let me add one consideration: AI and machine learning scientists are still computer scientists :)
What do we have at the intersection? This is the Data Scientist, i.e., the one capable of merging the three worlds and keeping the name of sexiest job of the 21th century!