How to approach DS in 2025.

Jun 25, 2025

When it comes to learning something new, how you approach it really depends on your goal. There are usually two:

Find a job
Upskill yourself

Each of these requires a different strategy.

But let’s be honest—the most popular goal right now is to get a job.

You're new to the data science world. There’s a ton to learn, so many courses to take, and endless rabbit holes to fall into. It’s overwhelming.

So let me cut through the noise.

Here’s how I would learn if I wanted to get a job as fast as possible:

SQL

Yes, you’ll need this for interviews and the job itself and you can’t avoid learning it.

Start with the basics:

The structure of a SQL query
SELECT, FROM , WHERE, GROUP BY , ORDER BY

Joins:

INNER, LEFT, FULL OUTER

Aggregations:

COUNT vs COUNT(DISTINCT)
Aggregations with CASE Statement

Windows functions:

RANK(), DENSE_RANK(), ROW_NUMBER(), SUM()
How to use PARTITION BY , ORDER BY and ROWS BETWEEN
Rolling calculations

CTE vs Subqueries vs Temp tables vs Views

And practice here makes you unstoppable.

Python

You don’t need to be a genius software engineer. But you do need to know enough to get the job done.

Here’s what you should focus on:

Data Types & Structures

Start with the basics:

Strings, integers, decimals, booleans
Lists (arrays), dictionaries
(Optional: stacks, queues)

You’ll probably see terms like heaps, trees, graphs in interviews, but unless you're going for an ML-heavy or CS-focused role, don't sweat them.

Algorithms

Just the essentials:

Loops, recursion
Linear & binary search
Understand Big O notation (space and time complexity)

Skip the deep stuff (unless you’re curious).

Data Science Packages

pandas
numpy
scikit-learn
statsmodels

And again: practice, practice, practice.

Statistics and experimentation

If you’ve got a math background, great. If not, don’t worry—this is still doable.

Here are the core concepts you need to know:

Descriptive Stats

Types of distributions
Mean, median, mode, quantiles, variance

Data Visualization

When and how to use histograms, box plots, scatter plots, etc.

Probability

Independent vs. dependent events
Bayes’ Theorem
Probability distributions (PDF, CDF)
Central Limit Theorem

Experimentation

Sampling methods
Point estimation & confidence intervals
A/B testing: setup, sanity checks, p-values, bootstrap methods, parametric vs. non-parametric tests

Causal Inference

Why it matters
Difference-in-Differences
Matching methods
DAGs (Directed Acyclic Graphs)

Machine learning foundation

Honestly? I’d skip the deep dive at the start. What you need is already brilliantly covered by Andrew Ng.

His Stanford Machine Learning lectures are gold. Watch them—up to Lecture 16

And don’t just watch. Implement everything. Practice it. Build things.

There is a lot to learn , there are no doubts to it. But you don’t need to know everything to get a job.

Stick to what matters.

Practice.

Data Marks

Discussion about this post