A/B testing is how data-driven teams improve their products.
But running valid A/B tests requires understanding the metrics and statistics behind them.
Most A/B tests fail because teams don't understand experiment design.
Core Concepts
What is A/B Testing?
You have two versions of something (a page, a flow, a pricing structure). You show version A to half your users, version B to the other half, and measure the difference.
Example:
- Control (A): Green "Subscribe" button
- Treatment (B): Red "Subscribe" button
- Metric: Click-through rate
Statistical Significance
Just because B gets 15% more clicks than A doesn't mean B is better. Maybe it's random variation.
Statistical significance tells you: "There's less than a 5% chance this is random variation. We can trust this result."
p-value: Probability that the result happened by chance
- p-value < 0.05 = statistically significant (we can trust it)
- p-value > 0.05 = not significant (could be random)
Sample Size
The more users you test on, the more confident you can be.
Example:
- 100 users: A gets 10 clicks, B gets 12 clicks. "B is 20% better!" (But sample size is too small to trust)
- 10,000 users: A gets 1,000 clicks, B gets 1,200 clicks. "B is 20% better!" (Large sample, we can trust this)
Conversion Rate
The % of users who take a desired action.
Examples:
- Click-through rate: % of users who click a button
- Sign-up rate: % of users who complete signup
- Purchase rate: % of users who make a purchase
- Retention rate: % of users who return after N days
How to Set Up a Valid A/B Test
Step 1: Define Your Hypothesis
Start with a clear prediction:
"Users will be more likely to sign up if we use a red button instead of green because red is associated with urgency."
Good hypotheses:
- Specific (red button, not "improve the page")
- Testable (we can measure it)
- Grounded in reasoning (why do you think this will work?)
Step 2: Choose Your Metric
Which metric tells you if the test worked?
Primary metric: The metric you're optimizing for
- Example: Click-through rate on the "Subscribe" button
Secondary metrics: Other metrics you care about
- Example: Did sign-ups increase? Did they actually stay around?
Be careful: Optimizing for one metric might hurt another:
- Red button → more clicks (+15%)
- But red button → lower satisfaction (-5%)
- Did the test work? Depends on what you care about
Step 3: Calculate Sample Size
How many users do you need to test on?
Rule of thumb:
- If your conversion rate is 5% and you want to detect a 10% improvement (5% → 5.5%)
- You need ~6,000 users per variation = 12,000 users total
Use a sample size calculator (Google: "A/B test sample size calculator")
Step 4: Run the Test
Randomly split your users:
- 50% see the control (green button)
- 50% see the treatment (red button)
Key rule: Don't stop the test early if B is winning. You'll get biased results. Run until you hit your sample size target.
Step 5: Analyze Results
After reaching your sample size:
Conversion rates:
- Control: 5.0% (100 signups out of 2,000)
- Treatment: 5.5% (110 signups out of 2,000)
- Improvement: +10%
Statistical significance:
- p-value: 0.08
- Significant? No (p > 0.05)
- Conclusion: The 10% improvement might be random. We can't trust it.
Decision: Don't implement. The improvement wasn't statistically significant.
If p-value was 0.03:
- Significant? Yes (p < 0.05)
- Conclusion: Less than 5% chance this is random. We can trust the improvement.
- Decision: Implement the red button.
Common A/B Testing Mistakes
❌ Mistake 1: Not Using Control Groups
You change the button to red and conversions go up 15%.
But what if conversions go up 15% for everyone (new marketing campaign, seasonal effect)?
Without a control group, you can't tell.
Always split: 50% control, 50% treatment.
❌ Mistake 2: Stopping Early If Winning
You're testing red vs. green. After 1,000 users, red is winning 6.2% vs. 5.1%.
"Let's stop and implement red!"
But if you wait for your full sample size, it might even out to 5.3% vs. 5.2% (much smaller difference, not significant).
Rule: Run until you hit your target sample size. Don't peek.
❌ Mistake 3: Testing Too Many Things
You test: button color, button text, button size, background color, CTA copy.
With 6 tests, 1 will show "success" by chance alone (5% false positive rate × 6 tests).
Rule: Test one thing at a time.
❌ Mistake 4: Confusing Correlation with Causation
You run a test over the holidays. Red button wins. You assume red is better.
But maybe people are more likely to buy during the holidays regardless of button color.
Rule: Control for confounding variables. Don't test during unusual periods.
❌ Mistake 5: Ignoring Secondary Metrics
Your red button increases clicks (primary metric) but users who click it have a 5% lower signup rate (secondary metric).
The test might "win" on clicks but fail on the outcome that matters (signups).
Rule: Always monitor secondary metrics. If primary wins but secondary loses, be suspicious.
Types of A/B Tests
Conversion Rate Tests
Optimize: Click-through rate, sign-up rate, purchase rate
Example: "Does a shorter form increase sign-ups?"
Retention Tests
Optimize: Churn rate, return rate
Example: "Does highlighting the benefits increase retention?"
Monetization Tests
Optimize: Revenue, average contract value
Example: "Does higher pricing maintain conversion but increase revenue?"
Time-to-Event Tests
Optimize: How fast people complete actions
Example: "Does simplified onboarding reduce time-to-first-purchase?"
The Bottom Line
A/B testing is how you prove that changes actually help your business.
To run valid tests:
- Clear hypothesis
- Correct sample size
- Random split (control + treatment)
- Run to completion
- Check statistical significance
- Implement if significant
Small wins compound. If every test improves your metric by 5%, after 10 tests you're 50% better.
Start testing.