Data Science is a fully multidisciplinary area. This means that professionals from all business segments or corporate departments can and must know the three pillars of Data Science, thus contributing to decision making and one of the most important skills for the professional of the future.
Before listing what these three pillars of Data Science are and explaining the importance of each one, how about understanding a little about the context of professionals looking for MBA in Data Science and Analytics USP/Esalq and what the job market expects from them?
Professor Luiz Paulo Fávero, who is also the vice coordinator of the USP course, explains all of this. “It is very common to find in the DSA graduate course students from areas such as computer science, engineering, statistics, applied social sciences, economics, administration, marketing, finance, accounting, among others.”
“However, professionals who, apparently, have no relationship with the area, such as journalists, doctors, dentists, veterinarians, and lawyers, for example, also make direct or indirect use of Data Science concepts for decision-making”, he explains.
Work without data?
Thinking about it, the professor is emphatic when he says that, currently, it is impossible to work without taking into account Data Science. “Regardless the sector, be it in industry, retail, services, agribusiness or finances, today we are overwhelmed with data, structured or not, from different sources”, highlights Fávero.
These sources are classified as:
- Primary: such as consumer surveys and satisfaction surveys
- Secondaries: such as data from associations, from the Central Bank, etc.
“Today, the amount of data is so large, extensive and coming from multiple sources, that it makes no sense to look at one area and say that it is more or less related to Data Science than another.”
Continuous and specialized study
Still contextualizing before presenting the three pillars of Data Science, Fávero compares, paying attention to the due proportions, the Data Science professional with medicine.
“The study in this area is continuous and becoming a data scientist is a very big step forward. In analogy, doctors spend a lifetime studying a particular area of expertise. In Data Science, we also have the experts.”
The professor made a list of examples:
- Data engineer, responsible for the structure or collection of data
- Machine Learning engineer, who treats the data and implements algorithms
- Software engineer, who looks at the background to structure and analyze data
- Data architect, who develops the entire relation among the data
“Besides, we still have information technology and statistics professionals. But that doesn’t mean that professionals from other areas cannot enter the field of Data Science, because none of this would make sense if we didn’t look at this part of structuring and analyzing data with a focus on generating outputs for decision-making”, he adds.
According to Fávero, deep and strategic knowledge of the business, even without much knowledge about the implementation of a certain algorithm or statistical foundation, for example, is essential.
Data Science pillars
“The issue behind Data Science involves a pervasive analysis for all spheres of the organization. This means that all areas, departments, and boards need to understand the importance of Data Science, Analytics, data analysis and pattern recognition, which is in fact Machine Learning, for the purpose of determining predictive models or diagnostic models ”, says the teacher.
“It is no longer possible to go, for example, to the legal department of a company, ask for a piece of information and the professionals in that area do not have time or are too focused on other activities. They need to understand that Data Science generates payback for the entire organization. This cultural issue is fundamental.”
Now, let’s check out the three pillars of Data Science:
- Fundamentals: they are essential for us to know exactly what we are implementing in terms of code. It is necessary to know the statistical, algebraic, econometric, calculus, operational research concepts and fundamentals, among others.
- Correct implementation of algorithms: this pillar derives from the first. After all, it is only possible to implement the code efficiently if it is based on the fundamentals described above, regardless of the software used (in the MBA USP/Esalq in Data Science, we use R more; in the MBA in Digital Business, Gretl is used. There are also paid options, such as Stata).
- Interpretation of outputs: the third pillar is related to obtaining results and the ability to interpret them for decision-making purposes. For this, it is important to know the business to extract information that will guide the allocation of resources, for example.
Balance is all
Be very careful here not to think that just because the Data Science pillars were presented in an order they necessarily have a hierarchy among themselves. “One pillar is not more important than the other. It is necessary to pay attention to the imbalance of importance between them”, warns Fávero.
“The fundamentals pillar, for example, is very important, but by itself, it does not ensure that the data and information that support decision-making are correctly translated”, he says.
The second pillar that we present is the tools. However, without knowledge of fundamentals, professionals can implement wrong algorithms that do not take into account a certain nature of a variable.
“Here at the MBA USP/Esalq, we don’t train code tighteners. Many people are eager to get to know the software right away, but it is necessary to start and know the fundamentals of each type of Machine Learning and Analytics technique. So it is possible to implement the codes in the chosen software”, emphasizes the professor.
And on the outputs pillar, Fávero is clear: “If you are unable to assess, for example, the nature of the data, the measurement scale of each variable and the implementation, you become an empty decision-maker, completely without foundation.”
The main Analytics errors
Now that you know what the Data Science pillars are and how they are interconnected, know that the main Analytics errors have to do with the imbalance between the pillars and the lack of good practices within each of them.
Fávero exemplifies when talking about Machine Learning techniques that are based on the use of only a certain type of variable, such as quantitative.
“This technique, then, will extract a score from the treatment of the interrelationship of quantitative variables, such as cluster analysis, for example. However, we continue to see professionals from the academic world and from the labor market that implement ‘clustering’ algorithms using qualitative variables, implementing an arbitrary weighting”, he details.
The professor continues: “This arbitrary weighting for variables that represent, for example, only semantic differential, makes the models appear more and more in a biased way. And then the decision-making follows. Some techniques have the prerogative of using only qualitative variables, only with frequency measures, in percentage, for example. No need to assign arbitrary weights to the categories.”
The unicorn professional
Finding a data scientist or data science professional who has mastered the three pillars of data science is very rare. Therefore, they are called unicorns.
“What we see are the ‘unicorn’ teams, with professionals distributed in functions related to these pillars. Thus, it is essential that companies recognize these attributes, these careers, so that decisions can be effectively supported”, he concludes.
Did you like to know the pillars of Data Science with those who understand the subject? Share with us! 😊