Everybody and their mother wants to learn data science. And there’s no reason not to – the job you do is interesting 95% of the time, the salaries are excellent, and most likely you can get the work done from the comfort of your bed.
Think you have what it takes to join a world-class team? Apply for one or more open positions at Appsilon.
Today we’ll go over seven essential skills every data scientist should have. Here’s the complete list:
Tasks in a day-to-day data science job are often vaguely defined – at least at the beginning of the project. More often than not, to provide any benefit from a data science solution, the data scientist must have a lot of domain knowledge.
For example, how could you possibly develop credit risk models if you don’t know anything about the subject? Sure, you could do your best and follow well-established data science principles, but that can only get you so far. As a result, your models won’t work optimally, and you won’t know what to do about it.
That’s where creativity and critical thinking come into play. Data scientists have to distill a lot of information in a short time frame. Having a team of highly creative people might expose solutions that no one thought of before.
Critical thinking will help you dig deeper and always ask the right questions, and spot potential biases in the responses.
How much math you’ll use daily depends on your role. These four areas come up most often when looking at data science prerequisites:
It’s definitely not something you can pick up in a week, as every listed subject falls into a category of college-level math.
It doesn’t mean you should spend the next year or so learning these subjects in depth, but you should know the basics. If you’re after a junior-level position, basic intuition and understanding of the applicability in data science should do. If you’re after a lead researcher position, it’s expected these topics are second nature to you.
There’s at least a several year gap between junior and senior data science positions, so you’ll always have the time to learn and explore further. The best part is – you can learn everything entirely for free! Here’s a complete reference for beginners:
All of the math, stats, and critical thinking skills in the world won’t help if you don’t know how to express your knowledge through code. Let’s take a look at the most widely used languages in data science:
In a nutshell – Python and R are industry leaders. SQL is supposedly used more than R, but that’s likely for another reason, covered later in this article.
If you’re entirely new to programming, there are some great news – both Python and R are easy to learn. On the other hand, if you’re coming from languages such as C or Java, these two shouldn’t be a problem to pick up.
After all, Python was designed for teaching programming concepts to kids, so how complicated can it be for well-educated professionals?
As for R, here’s what you can do with it (assuming a basic knowledge of programming concepts):
To become an efficient data scientist, your data analysis and visualization skills have to be top-notch. Your results are here to tell a story, and nobody wants to read an incomplete and poorly presented one.
There’s a whole suite of data analysis and visualization packages available for both R and Python. R’s most popular analysis package is
dplyr, and for Python, that’s
Want a complete beginner guide on data analysis with R? Check out our detailed guide to R’s dplyr.
When it comes to data visualization, a lot will argue that R takes a point here. The visualizations look better, especially with the default stylings. It’s most popular visualization library is
ggplot2, and we have an entire series to get you started:
To conclude – proper analysis and visualization skills are a must. It’s not enough to know how to write code, but also to ask the right question. That’s why creativity and critical thinking are so important.
This is where all the hype is. Machine learning has been a trending topic over the last couple of years. It’s not that new of a concept – as it’s been introduced back in the 1950s – but the improvements in computing power made it accessible to almost anyone.
As a result, most companies included machine learning in their core service. This goes from something as basic as flower species classification to autonomous vehicles.
The applications of machine learning are endless, so the learning path shouldn’t be the same for a business user and an aspiring computer vision engineer. Still, starting from the basics can’t hurt.
Here’s a couple of basic machine learning articles to get you started:
These two articles by no means capture the essence of “basic machine learning”. It’s a broad and fastly evolving field, so a single book or course won’t be enough to cover everything.
It’s likely you won’t work with CSV and Excel files most of the time. Instead, datasets will be stored in databases. There are many database vendors out there, such as Microsoft, IBM, and Oracle, and all of them have a single thing in common – SQL.
It’s a language for storing, extracting, and manipulating data within the database. SQL syntax varies a bit from database vendor to vendor, but the differences subtle, so it won’t take you much time to feel comfortable again if you decide to change a vendor.
You can go as simple or as complex as you want with SQL. The term “simple” indicates you’re just using it to drag the dataset to the memory (e.g., with Python or R), and “complex” indicates you’re doing most of the computations and aggregations in the database.
The second approach is a way to go if speed is critical, but it’s also a bad practice to pull the data you don’t need.
Learning the basics of SQL shouldn’t take you much time. From a Python/R perspective, there are pre-made packages for establishing connections with any database, both on-premise and cloud-based. These packages are also well documented (usually), so establishing connections shouldn’t be an issue.
To recap – learn the basics of SQL so you can do the “heavy lifting” within the database, and pull only the data you need to Python/R.
Less than 30% of data scientists have a bachelor’s degree or less, and around 20% are PhDs, according to the 2018 study made by Indeed. In a nutshell – a master’s degree is an expected common ground.
Here’s a complete overview per profession and education level:
This doesn’t mean you can’t get hired as a data scientist without a college degree, but only under two conditions:
Yes, a degree is a useful thing for data science jobs, but a degree in what? Let’s take a look at the following chart:
As you can see, most data scientists have a background in either computer science, business, or math/stats. The number of data scientists with an official degree in data science is expected to rise, as there are more and more universities offering this specialization.
And there you have it – 7 essential skills you should have for a job in data science. The take-home point is: knowing the basics of each should be enough to get you an entry-level position. Only years of experience will help you climb the corporate ladder, and there’s always time to dive further into specific areas.
If you want to implement machine learning in your organization, you can always reach out to Appsilon for help.
Appsilon is hiring for remote roles! See our Careers page for all open positions, including R Shiny Developers, Fullstack Engineers, Frontend Engineers, a Senior Infrastructure Engineer, and a Community Manager. Join Appsilon and work on groundbreaking projects with the world’s most influential Fortune 500 companies.