AI DS Workflow
If you are a data scientist using AI tools, you might have noticed that sometimes the AI feels like a junior analyst who needs constant hand-holding, and other times it feels like a superpower.
The difference usually comes down to how you interact with it.
I have tried to accumulate the main hints that can make your “AI data science” workflow flow significantly better.
The Core Workflow Patterns
1. Your Context File is Everything
The single most impactful move you can make is to maintain a small, always-updated context file that the AI can read at the start of a session. You can call it AI_CONTEXT.md or PROJECT.md.
A good context file ensures the AI knows your role, domain, and constraints. It stops you from repeating yourself each session and ensures outputs match your standards for SQL dialects, naming conventions, tone, and statistical assumptions.
How to start:
Include your role and domain, your data stack at a high level (warehouse, notebook, BI), your default assumptions (time zones, definitions, experiment conventions), and a few things you repeatedly ask for.
Example bullets to include:
I am a data scientist working on growth questions.
Default SQL dialect: Snowflake. Prefer CTEs; avoid vendor-specific functions unless asked.
Lead with the business implication, then show the analysis.
Challenge my assumptions before agreeing.
When I say ‘experiment’, assume: primary metric, guardrails, segmentation, power/significance checks.
2. Be Painfully Specific About What You Want
Treat the AI like briefing a new teammate. Taking 30 seconds to provide context upfront saves 15 minutes of back-and-forth.
❌ Bad:
“Help me with this experiment.”
✅ Better:
“Evaluate experiment X. Focus on segment Y. Show metric Z by country and platform. Include significance, effect sizes, and a sanity-check section (SRM, sample ratio, missingness). Return a 6-bullet executive summary plus a table.”
3. Ask for the Approach Before the Answer
This catches wrong assumptions early before the AI writes any code.
Example:
Before you write any SQL or code, propose: • The dataset(s) I likely need • The join keys and grain you will assume • The edge cases to verify • 2–3 validation queries you would run first
4. Prefer Multi-Turn Collaboration Over One-Shot Prompts
Do not restart from scratch for every question. Build a thread where each turn carries context you do not have to restate. The compounding effect is real.
Step 1 “Help me identify candidate tables and grains.”
Step 2 “Now draft the query and sanity checks.”
Step 3 “Add segment breakdowns and moving averages.”
Step 4 “Turn results into a readout and caveats.”
5. Specify Output Format
If you need something pasteable, say so explicitly.
Return:
A 5-bullet executive summary
A markdown table with: metric, control, test, delta, delta_pct, p_value
A “Checks” section (SRM, missing data, outliers)

