I get asked very frequently : “What eventually is Data Science?”. Even senior practitioners seem to be confused on this.
Have you ever tried to cook? For cooking a good meal you need 3 components:
- A recipe
Ingredients must be good and their quality matters. You could eat ingredients raw, maybe combined in a salad. You can also cook them. The taste of the meal will depend on the cooking tool: a water-filled pot, a frying pan or an oven. You still need a recipe that guides you through the steps of the cooking process.
Analysing data is the same as cooking.
Data are the ingredients. They are raw numbers collected by sensing the world. Big-Data is raw data with huge volume, variety and velocity, that cannot be processed on personal computers. We can consume data raw, for examples via a dashboard. Analytics are the tools for processing data. Nowadays, Machine Learning and AI seem the best processing tools. They enable what is known as advanced analytics: they are like a high-class oven! Finally, instead of a tasty meal, the outcome of data analysis is a greatest business value.
Data Science is the recipe. We can define Data Science as the set of processes that enable the extraction of non-trivial information from raw data via advanced analytics (see Data Science, Kelleher and Tierney, MIT Press, April 2018).
CRISP-DM is an example of Data Science process. The story of Walmart might be exemplary for Data Science. When in 2004 Linda M. Dillman, Walmart’s Chief Information Officer, wanted to know the most sold item during hurricanes, the first answer the analysts gave was: “It’s water!”. Linda knew already that water was in high demand during hurricane emergencies. She asked for a sharper answer. It was only after few iterations when analysts found that strawberry Pop Tarts are the most sold item during hurricanes and beer the pre-hurricane top-selling.