Data Marks

7 reasons why your experiments fail

Eltsefon Mark's avatar
Eltsefon Mark
Aug 27, 2025
∙ Paid

When you first start working with experiments, it feels simple:
pick a metric → randomize users → apply a statistical test → get a result → make a decision.

But we all know it’s a lot harder than it looks at first glance.

Here are 7 of the most common reasons why your experimentation might not be working properly:

1. Decentralized experiments = chaos

If every team runs tests on their own, you’ll quickly face:

  • Conflicting experiments affecting the same users.

  • Overlapping metrics and cannibalization of them

  • Teams cherry-picking results to look good.

Solutions:

  • Centralization.


A single A/B platform, shared methodology, validated test designs, and a knowledge base of past experiments. This ensures transparency, consistency, and less wasted effort.


2. Skipping experiment design

Some people underestimate the importance of experiment design. But without it, you don’t know:

  • How long to run your test.

  • Whether you’ll detect meaningful effects.

  • The actual power of your experiment.

Good design means calculating MDE (minimum detectable effect), sample size, defining the randomization unit, understanding the metrics, and much more. Without this, you risk stopping too early or wasting weeks on inconclusive data.

Data Marks is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


3. Peeking too early

A classic mistake. You see significance, get excited, and want to stop right away. But stopping a test the moment you “see a win” increases false positives tenfold.

Solutions:

  • General rule: don’t stop before the pre-determined duration.

  • Only stop early for strong negative signals (e.g. revenue tanks).

  • If your experimentation system is mature, use sequential testing methods (like mSPRT) to control error rates.


4. Ignoring multiple testing

Many data scientists forget that multiple testing is not only about having multiple variants, but it’s also about metrics.

We rarely measure just one metric. But if you track 10+ KPIs without corrections, you’ll almost always find false “wins.”

Solutions:

Keep reading with a 7-day free trial

Subscribe to Data Marks to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Eltsefon Mark
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture