An Overview of Data Science and How to Break Into the Field
In the introduction to this map, I provided a broad definition of data science. I’d now like to lay out what type of work is done by data scientists and what the job market looks like for those in the field.
What does a data scientist do and how do I become one?
To become qualified as a data scientist in the eyes of employers, the typical starting point is an undergraduate degree. Without formal training and a college education, most companies won’t even consider hiring you for a data scientist role. The good news is that you have some flexibility with your choice of major. Getting your degree in data science itself is the straightforward option (if your school offers such a major), but not the only one and not necessarily the best. Among data scientists in 2018, 20% had a computer science degree, 20% had a natural sciences degree, 18% had an engineering degree, and 17% had a degree in the social sciences or business field. Because working with data is a relevant skill in every field, you may wish to choose your field of interest and then pursue data science within the context of that field. For example, if you are interested in lab research you may go into biology, or if you’re interested in consulting you may major in business or economics. You can then choose to specialize, either through a data science-oriented major within those fields (ex: bioinformatics) or by taking classes focused on data science topics and learning data science skills outside the classroom. This route has the advantage of providing you direction in a certain field to apply those data skills you acquire. Getting a data science degree, or similarly, a statistics or mathematics degree, is a great option too - you’ll just need to decide what you want to apply those skills to later on.
So you’ve gotten your college degree, and now you’re looking to get hired as a data scientist for some company or institution. What kind of work can you expect to do? Work with data, of course! While the application of your work will depend on the field you’re in, you can expect to spend time gathering data from public and private sources, cleaning and transforming that data, and possibly even conducting analytics on that data. Analytics can involve anything from creating models and algorithms to explore patterns in the data/make predictions to creating charts and tables that summarize results. Much of your time will likely be spent just finding the relevant data for your project and “cleaning” it - that is, structuring the dataset into a standard and easily usable form for analysis. As you become more advanced, you may spend more time designing predictive models or machine learning processes. For more detailed information on how a workday may look, check out these threads where actual data scientists share their work schedules. Also, check out these interviews with data scientists on Menti’s Career Exploration page!
One other thing to note is that it’s becoming increasingly common for data scientists to hold advanced degrees - Masters or even PhDs. While there are many considerations to make on whether getting an advanced degree is right for you (too many to go over in this map), keep in mind that it’s where the field may be headed.
2. What kind of skills does a data scientist have?
Much of the work done with data involves coding using a programming language, Python and R being two prominent examples. These languages are powerful tools that make the job of collecting, cleaning, and analyzing large or unstructured datasets much easier. There are many other options out there besides Python and R: Java, C++, MATLAB, SQL, Stata, or Julia are all used by data scientists. Deciding which languages you should learn is a matter of personal preference and field of application. For example, among economists and economic researchers, Stata and MATLAB are very common, while among computer scientists they are rarely used. At the same time, there are also many economists who heavily use Python or SQL - so that even within a field there is a lot of variation in the program of choice. To decide which languages are right for you, ask professors, professionals, and older students in your field of interest what they use. It can also be helpful to learn a handful of these languages rather than just one since each language has its own strengths and weaknesses depending on the task. Fortunately, it tends to be easier to learn new languages once you’ve already learned one since they share many functions and syntax in common. For resources on learning how to program, I’ve compiled a list later on in this map!
Besides programming, the other major skill for data scientists is knowledge of statistical methods. Probability and statistics are the backing of most analysis you would do, and a math background can also help you understand what commands in your code are actually doing. So a quantitative background is a must, especially if you wish to get involved with the frontier of the field utilizing machine learning and AI methods. Much of the work you’ll do also involves exploring datasets and searching for patterns or answers. Having a mindset of curiosity, always asking questions about your data, and thinking of creative ways to find answers are all must-haves. Those who can combine critical thinking with creative approaches will be highly sought after, and as the next question reveals, rewarded well for their skills.
3. What’s the job market like for a data scientist?
In short: it’s very good. As previously mentioned, data science is a rapidly growing field that is relevant to just about every other field. Data scientists and related positions (data analyst, data engineer, etc.) are among the fastest-growing roles for new job openings. You would be hard-pressed to find a major company or institution that doesn’t employ at least a couple of data scientists. Since your skills are applicable in any situation involving data, it’s not hard to find a job opening in any industry, company, or region you may be interested in.
Compensation is above average for the typical data scientist too: the average salary in 2020 was $100,560. An entry-level data scientist can expect a salary of around $95,000, while an experienced one at the management level can make as much as $250,000 per year. At the “big tech” companies like Google and Apple that employ hundreds of data scientists, salaries average around $150,000. There is a wide range depending on the industry you go into, but what’s certain is there are many options available.
4. How do I prepare for interviews for a data science position?
The most obvious answer here is to be well-trained in several programming languages and have a solid foundation in math, especially probability and statistics. Of course, this is no easy task. Later sections of this map will provide resources for how to improve your skills in these areas. Rose Tan has compiled an incredibly useful collection of interview prep guides at the bottom of her article here (and she also writes great articles on data science!). Andrei Lyskov has also written an excellent article on preparing for data science interviews here. A unique feature of data science interviews is that they may involve a “coding challenge”, which is an opportunity to showcase your programming knowledge. You’ll be presented with a small problem to solve, perhaps involving a dataset, and asked to use a program to deliver answers. Sometimes these will be emailed to you with a 24-hour window to complete, other times you may be expected to solve the problem on the spot at the in-person interview.
Besides that, interviews follow much the same process as any other field. Preparing for interviews, in general, is a topic Menti has discussed here. My last piece of advice would be to research thoroughly the company you’re interviewing for, search around the Internet for potential interview questions on websites sites like glassdoor, and reach out to data scientists both at that company and elsewhere for help.