In the past few years, we've seen the dramatic rise of data science as a career that spans all industries and geographies. In approximately 12 years (give or take), businesses have realized they need employees who are skilled in advanced analytics that can help leverage and monetize all of the data that they have access to. As this rise has occurred, media outlets have had a field day reporting on how data science and A.I. will change the world, consequently increasing interest in the field even further. However, while much has been made of this new career path, only a few sources have attempted to quantify this growth and the shortage in data scientists.
That's why today we're happy to publish our report on Data Science Career Trends In 2020. This report, which is provided as a public workspace hosted on our own data science platform, provides a high-level overview of the state of the world in data science careers today. The report attempts to quantitatively measure the current size of the advanced analytics labor market, growth and compensation in this market, and the shortage of data scientists today and in the past.
While the report contains a brief explanation of our data sources and our methodology, we wanted to expand a bit more on how we approached this task, as it required a mix of art and science and we believe it's important for everyone to understand any shortcomings in our methodology so they can account for them when using this data for themselves.
Discovering and aggregating data on the state of the job market today and in the past proved to be the most challenging and time-consuming part of the process. As previously mentioned, despite the popularity of data science in the media and in business, there are few primary sources that have published data on the number of advanced analytics workers in industry today or in the past. In order to gather the data we needed to measure the size of the labor market, we researched a variety of historical reports and analyses to understand estimates of the labor market in the past. This allowed us to contextualize present-day sources that could provide additional context into the state of the labor market today.
This historical data, while sparse, allowed us to determine a range for the size of the labor market in previous years. After determining this range, we attempted to quantify the size of the market today. While we did find an estimate for the total number of data scientists from the Bureau of Labor Statistics, it appeared to be quite low given the context that we had gathered from previous articles. Ultimately, we opted to gather primary data by using one of the best sources of career data today, LinkedIn.
Unfortunately, LinkedIn does not provide an exact count of the total number of employees with a specific job title, nor does it provide an exact count for the number of available jobs on its platform with a given job title. However, it does allow users to see an approximation of how many of its members within any given user's extended network match a particular search criteria, and it also allows individual members to see the number of relevant jobs within a particular geographical area that match a given search term.
Given these constraints, we used my LinkedIn account to run a variety of search terms to estimate the number of data scientists, machine learning engineers, and A.I. researchers within my extended network. While I'll be the first to decry the many shortfalls of this method, it does provide a starting point for quantifying labor market data in the advanced analytics industry. Using this strategy, we conducted a variety of searches on LinkedIn that provided an approximation for the total number of advanced analytics employees within a variety of regions, and we combined results from these searches with statistics we discovered from research on historic metrics. This allowed us to fit an exponential growth curve to the time series data we collected, which we then used to interpolate the data for any missing years in our dataset. This curve also served to strengthen our belief that the estimate from the BLS may have been too low.
Once we gathered employee counts, we used our same strategy to gather a list of open data science jobs across various geographies. Given these two sets of data, we were able to conduct a basic analysis on the state of the shortage in data scientists today and in the past.
After gathering job market data, we aggregated both historical and present-day data on salaries, broken down by geography. We used a variety of published statistics from job boards, consulting reports, surveys, and articles to aggregate multiple statistics on mean and median salaries. Using this data, we then constructed an analysis on the average salary for advanced analytics workers, both nationally and within several large urban centers.
Finally, we aggregated data on educational programs that specialize in data science tools and skills. In this instance, it proved too difficult to find historical data, so we limited our analysis to educational programs today. In our analysis, we only included programs that we judged to have a data science-first approach, focusing on the core statistical, algorithmic, computational, and analytical skills relevant to a data science career.
Once we aggregated all of our data, we organized it into a public Google Sheet and then connected that sheet to the workspace using our Google Sheets data connector. We understand that our analyses falls short in precision, however we believe it may still prove valuable to those who seek to understand the state of data science careers today.
If you're interested in staying in the loop with what we're up to at Apteo, you can subscribe to our newsletter or create a free account for yourself on our platform where you can visualize, analyze, and predict your own data. Finally, if you're interested in helping us build a data science platform that lets anyone analyze their data, especially if you're a full-stack engineer or a growth marketer, please get in touch!
Shanif Dhanani is the co-founder & CEO of Apteo. Prior to Apteo, Shanif was a data scientist and software engineer at Twitter, and prior to that he was the lead engineer and head of analytics at TapCommerce, a NYC-based ad tech startup acquired by Twitter. He has a passion for all things data and analytics, loves adventure traveling, and generally loves living in New York City.