When intuition wins over A/B test?
When making choices that affect products, policies, or strategies, it’s important to understand not just what the data says, but how much trust we can place in it. Not all evidence is created equal—some approaches offer more reliable insights about what will happen if we take a particular action. This guide outlines a spectrum of evidence types, from the least to the most trustworthy, and offers practical questions to help you assess each one.
1. Intuition and Experience
What it is: Relying on personal judgment, gut feelings, or past experiences to make a decision. The other name for that is HiPPO (highest-paid person’s opinion)
When it’s used: Often the default when time is short or data is unavailable.
Risks: Highly subjective; may be and is influenced by biases or overconfidence. Not suitable for high-impact or complex decisions.
Consider: Is the potential downside of being wrong significant? If so, go up one step
2. Correlations
What it is: Noticing that two things tend to happen together (e.g., users who see more notifications also engage more), therefore correlated.
When it’s used: Early exploration or when data is limited.
Risks: Associations can be misleading—just because two things move together doesn’t mean one causes the other. Hidden factors may be at play. Here you can find all the weird correlation examples you can find and imagine now, you will make the business decisions based on that!
Consider: What other explanations could there be for this pattern? Could a third factor be influencing both?
3. Statistical Adjustments
What it is: Using statistical models to account for differences between groups (e.g., regression, matching, classification).
When it’s used: When you have data on relevant characteristics and want to estimate the effect of an action.
Risks: Only as good as the variables included; unmeasured factors can still bias results. A lot of quality checks and diagnostics should be made for this kind of design.
Consider: Have you included all important variables? What might you be missing? The main question here is how could two users with the same features get different treatment statuses and why?
4. Quasi-experiments
What it is: Taking advantage of real-world changes that mimic random assignment (sometimes called “natural experiments”). These changes allow you to compare groups that are similar except for the action or change you’re interested in, helping to estimate cause-and-effect relationships.
Examples of methods:
Differences-in-differences.
Synthetic control methods.
Regression discontinuity
Instrumental variables
When it’s used: When true randomization isn’t possible, but some external process creates comparable groups.
Risks: These methods depend on specific assumptions (e.g., that nothing else changed at the same time or parallel trends assumption).
Consider: What must be true for this approach to give a fair answer? Are those conditions likely met?
5. Meta-analysis of quasi-experiments
Keep reading with a 7-day free trial
Subscribe to Data Marks to keep reading this post and get 7 days of free access to the full post archives.
