2 minute read

Theses slides are taken from the Lynda / Linkedin Learning course intitled “Big Data Foundations: Techniques and Concepts”. They are self explanatory and, in fact, come from an O’reilly Survey of Data Scientists and Their Work : “Analyzing the Analyzers”. It’s a free ebook, a little bit old but quite interesting. Here a few diagrams just as a remainder.

Despite the excitement around “data science,” “big data,” and “analytics,” the ambiguity of these terms has led to poor communication between data scientists and organizations seeking their help. In this report, authors Harlan Harris, Sean Murphy, and Marck Vaisman examine their survey of several hundred data science practitioners in mid-2012, when they asked respondents how they viewed their skills, careers, and experiences with prospective employers. The results are striking.

Clustering Data Scientists

Four clusters in the responses have been identified.

Data Business people are those that are most focused on the organization and how data projects yield profit. They were most likely to rate themselves highly as leaders and entrepreneurs, and the most likely to have reported managing an employee (about 80% have). They were also quite likely to have done contract or consulting work, and a substantial proportion have started a business.

You can think of Data Creatives as the broadest of data scientists, those who excel at applying a wide range of tools and technologies to a problem, or creating innovative prototypes at hackathons. The Data Creative respondents latched onto the term Artist like no other group.

Data Developers are people focused on the technical problem of managing data — how to get it, store it, and learn from it. A huge part of them are closely integrated with the Machine Learning and related academic communities.

One of the interesting career paths that leads to a title like “data scientist” starts with academic research in the physical or social sciences, or in statistics. Many organizations have realized the value of deep academic training in the use of data to understand complex processes, even if their business domains may be quite different from classic scientific fields. Data Researchers tend to be from these backgrounds.



Here is a set of 22 generic skills that spanned the range of useful things that data scientists might do in their work


Combining Skills and Self-ID


Several reasonable observations fall out of this initial categorization. First of all, Data Businesspeople are most likely to have primarily Business-related skills. This is certainly a reassuring result. Also of note is that half of Data Businesspeople have strongest skill rankings in other areas, such as Statistics and ML/Big Data. Second, the largest group of respondents, Data Researchers, were also those most likely to have expertise in Statistics or, perhaps, Math.1 Third, both Data Businesspeople and Data Researchers were quite unlikely to rate Programming skills as their highest skills. And fourth, Data Creatives and Data Developers demonstrated greater variability in how they ranked their skills than others (see also Figure 4-2). Data Creatives and Data Developers are also the two groups most likely to excel in ML/Big Data and Programming skills, but there are substantial differences between the experiences of these types of data scientists. When trying to describe subtypes of data scientist, the Groups were more evocative and a better primary label for practitioners, with the Skill Groups a correlated but secondary label. One may find it valuable to describe someone as a “Data Researcher with depth in Machine Learning/Big Data,” or as a “Data Businessperson with depth in Statistics.”