Advanced Computing in the Age of AI | Tuesday, March 19, 2024

‘Data Scientist’ Title Evolving Into New Thing 

(TrideRR/Shutterstock)

Who is a data scientist, and what do they do? It’s a worthy question, and one that’s been asked an untold number of times over the years. But now it appears the job definition of the data scientist is changing into something new. The big question, then, is what replaces it?

The traditional definition of a data scientist is somebody who has the requisite skills to build applications that can manipulate large amounts of data to affect some positive change in an organization. The Venn diagram of the data scientist typically shows strength in three overlapping areas: mathematics/statistics, computer science, and business expertise (although there’s often less emphasis on the third category, owing to the rarity of these triple-threat “unicorns”).

In response to the steep demand for data scientists and the kingly salaries they command, universities revamped their curricula and degree programs, and began churning out freshly minted data scientists ready to spin big data into gold for an eager industry. Fortune 1000 firms and investments banks competed with Web giants to land the smartest “quants” who could lever big data into a competitive advantage. In 2012, Harvard Business Review declared data scientists the “sexiest job of the century.”

But that appears to be old news. Some of the folks who are doing the toughest “data science” work now are no longer calling themselves data scientists, said Ben Lorica, who still held the title of “chief data scientist” for O’Reilly Media as this story went to press.

“I have to say the term data scientist has become muddled a little bit,” Lorica said during his keynote address at the recent Strata Data Conference in San Francisco. “There has been confusion about who to call data scientists.”

Lorica elaborated on his comments later during an interview with Datanami. “I was talking to a technology person in the Bay Area and this person said in this company we have two kinds of data scientists: the data scientist data scientists, and what used to be called the business analyst who did SQL.

“I asked ‘Why are you doing that, because that’s confusing for us on the outside,'” Lorica continued. “If someone says they’re a data scientist, but they’re not, there’s nothing we can do because that’s a career path and people want that title!”

Nothing stays put forever, and in some ways it’s not surprising that the title of data scientist has begun to morph. Addison Snell, the principle analyst at Intersect360 Research, pointed this out during a panel discussion yesterday at Tabor Communications‘ annual Advanced Scale Forum, which is being held this week in Ponte Vedra Beach, Florida.

Data scientists are not data scientists anymore (Artisticco/Shutterstock)

“When big data got going, all of a sudden everyone was trying to hire data scientists,” Snell said. “And you watch people’s LinkedIn profiles say ‘data scientist’ with 25 years of experience. That’s a title we invented last week! But suddenly we have decades of experience.”

Clearly, the market could not stand having an unmet need for data scientists for so long. While the universities worked to churn out PhDs data scientists — a process that takes years — organizations decided to rebrand SQL-loving data analysts into “lite data scientists.”

Simultaneously, the rise of automated data science platforms, what Lorica and others calls “auto-ML tools,” provided another way for organizations to obtain the necessary data science skills. The vendors behind these auto-ML these platforms — DatarobotH2O.aiAlteryxDomino Data LabAnaconda, and DeterminedAI, among others — don’t claim to fully replace data scientists, but instead say they can help existing data scientists get more work done with assistance from less-skilled folks around them, who may be referred to as “citizen data scientists.”

So if SQL-loving business analysts are the new “data scientists,” and citizen data scientists are scaling the data science ladder via auto-ML tools and data science platforms, what, then, are the “actual” data scientists calling themselves? Lorica has some ideas.

Engineers Vs. Scientists

Lorica has noticed a subtle change, mostly among tech companies in the San Francisco-Silicon Valley area, in the title they use.

“I try to be nice to the title of data scientist, because that’s my title too, but it’s been somewhat diluted,” Lorica admits. “It seems like there’s some rebranding. Some data scientists are calling themselves machine learning engineers.”

Now, machine learning engineers aren’t a one-for-one replacement for data scientists. It doesn’t work that way. Rather, machine learning engineers are folks who have a stronger engineering background that traditional data scientists, Lorica says.

Machine learning engineers “know enough machine learning to build the initial model and deploy to production and maybe tweak the model,” he says. “They’re generally perceived as being more impactful because they can typically take it to the end. They’re not just building toy models. They’re touching production system.

A variety of data-oriented titles are in use (Image courtesy “The State of Machine Learning Adoption in the Enterprise” from O’Reilly Media)

Just as data engineers have become some of the most impactful folks on a data team — and also some of the most in-demand and difficult-to-fill positions — Lorica sees machine learning engineers carrying the ball forward in a meaningful way. “I think it’s a title that’s going to be harder to dilute because of the word ‘engineer,'” he says.

According to O’Reilly’s 2018 survey, “State of Machine Learning In the Enterprise,” the title “data scientist” is used by 81% of organizations with extensive machine learning expertise. That was followed by “machine learning engineer,” used at 39% of organizations, and “deep learning engineer,” used at 20%.

There’s one more title that’s in play to possibly replace the traditional role of a data scientist, which is perhaps already outdated: research scientist.

“The other title they’re using is research scientist, which is a title conferred on people who are really attuned to modeling,” Lorica says. “They’re more sophisticated in their understanding of models….Most of the time, [machine learning engineers] can probably build simple models, and then if you need really sophisticated, advanced models, you bring in this sophisticated research scientist person.”

Unicorns will never be the same.

About the author: Alex Woodie

Alex Woodie has written about IT as a technology journalist for more than a decade. He brings extensive experience from the IBM midrange marketplace, including topics such as servers, ERP applications, programming, databases, security, high availability, storage, business intelligence, cloud, and mobile enablement. He resides in the San Diego area.

EnterpriseAI