At The Data Incubator, I’ve spoken to hundreds of employers looking to hire data scientists -- particularly those with advanced degrees. With all the hype surrounding big data these days, it’s unsurprising that there’s as much misinformation floating around as there are facts. Unfortunately, hiring managers often fall victim to believing many common misconceptions to be true. Here are three facts about data science that hiring managers may not understand:
Data scientists and software engineers are not the same.
Believing the two are synonymous is a common mistake. While engineers with software development backgrounds do sometimes call themselves data scientists to capitalize on the associated salary premium, the results tend to be mediocre. Engineers are trained to fix bugs in programming, but when they lack a deeper understanding of probability and statistics, they often struggle to solve statistical bugs. Even though their code itself might be just fine, their predictions will be off if they built their code upon flawed statistics. In order to create truly scalable predictive models, deeper and more nuanced statistical understanding is necessary -- and many software engineers are lacking where data scientists are not.
Big data is more than statistics and intelligence.
Those with little to no experience with software development, many hiring managers among them, often fail to recognize this. Keeping a plant alive in your office window is quite different from running a farm, right? When you scale up, you have to change the way you do things in order to make them work. The same concept applies when you add more data. Big data strains the classic models of computation and eventually renders them ineffective. When you’re dealing with big data, all of the data just can’t fit into RAM. Traditional business intelligence calculations become unwieldy and can’t be completed in a reasonable time frame. Distributed computation and parallelization may be obvious answers to scaling, but they’re not always so simple -- thus necessitating more involved solutions. Traditional business analytics are as different from distributed statistical computing as your window plant is from a farm. A real data scientist will understand this and know how to handle it.
Data scientists need to understand the business.
Many of those practiced in machine learning don’t recognize this fact. While you can do a lot with machine learning, it’s not all-powerful and it can’t tell you everything. Business intuition guides data scientists, allowing them to identify real correlations and ignore false ones. Likewise, mistaking correlation for causation can have costly consequences. If a data scientist lacks the necessary domain expertise, simply following what they think the data says can lead to poor policy recommendations based upon ill-founded conclusions.
Having business intuition is also essential for a data scientist when it comes to convincing key stakeholders of the validity and importance of their conclusions. Often, the stakeholders in question will be domain experts, not data scientists; being able to talk about their findings in a way that makes sense to those stakeholders is a crucial part of achieving the institutional buy-in needed for data science to have a real impact on the business.