The Beginner's Guide to A/B Testing and Experimentation Metrics

A/B testing is how data-driven teams improve their products.

But running valid A/B tests requires understanding the metrics and statistics behind them.

Most A/B tests fail because teams don't understand experiment design.

Core Concepts

What is A/B Testing?

You have two versions of something (a page, a flow, a pricing structure). You show version A to half your users, version B to the other half, and measure the difference.

Example:

Control (A): Green "Subscribe" button
Treatment (B): Red "Subscribe" button
Metric: Click-through rate

Statistical Significance

Just because B gets 15% more clicks than A doesn't mean B is better. Maybe it's random variation.

Statistical significance tells you: "There's less than a 5% chance this is random variation. We can trust this result."

p-value: Probability that the result happened by chance

p-value < 0.05 = statistically significant (we can trust it)
p-value > 0.05 = not significant (could be random)

Sample Size

The more users you test on, the more confident you can be.

Example:

100 users: A gets 10 clicks, B gets 12 clicks. "B is 20% better!" (But sample size is too small to trust)
10,000 users: A gets 1,000 clicks, B gets 1,200 clicks. "B is 20% better!" (Large sample, we can trust this)

Conversion Rate

The % of users who take a desired action.

Examples:

Click-through rate: % of users who click a button
Sign-up rate: % of users who complete signup
Purchase rate: % of users who make a purchase
Retention rate: % of users who return after N days

How to Set Up a Valid A/B Test

Step 1: Define Your Hypothesis

Start with a clear prediction:

"Users will be more likely to sign up if we use a red button instead of green because red is associated with urgency."

Good hypotheses:

Specific (red button, not "improve the page")
Testable (we can measure it)
Grounded in reasoning (why do you think this will work?)

Step 2: Choose Your Metric

Which metric tells you if the test worked?

Primary metric: The metric you're optimizing for

Example: Click-through rate on the "Subscribe" button

Secondary metrics: Other metrics you care about

Example: Did sign-ups increase? Did they actually stay around?

Be careful: Optimizing for one metric might hurt another:

Red button → more clicks (+15%)
But red button → lower satisfaction (-5%)
Did the test work? Depends on what you care about

Step 3: Calculate Sample Size

How many users do you need to test on?

Rule of thumb:

If your conversion rate is 5% and you want to detect a 10% improvement (5% → 5.5%)
You need ~6,000 users per variation = 12,000 users total

Use a sample size calculator (Google: "A/B test sample size calculator")

Step 4: Run the Test

Randomly split your users:

50% see the control (green button)
50% see the treatment (red button)

Key rule: Don't stop the test early if B is winning. You'll get biased results. Run until you hit your sample size target.

Step 5: Analyze Results

After reaching your sample size:

Conversion rates:

Control: 5.0% (100 signups out of 2,000)
Treatment: 5.5% (110 signups out of 2,000)
Improvement: +10%

Statistical significance:

p-value: 0.08
Significant? No (p > 0.05)
Conclusion: The 10% improvement might be random. We can't trust it.

Decision: Don't implement. The improvement wasn't statistically significant.

If p-value was 0.03:

Significant? Yes (p < 0.05)
Conclusion: Less than 5% chance this is random. We can trust the improvement.
Decision: Implement the red button.

Common A/B Testing Mistakes

❌ Mistake 1: Not Using Control Groups

You change the button to red and conversions go up 15%.

But what if conversions go up 15% for everyone (new marketing campaign, seasonal effect)?

Without a control group, you can't tell.

Always split: 50% control, 50% treatment.

❌ Mistake 2: Stopping Early If Winning

You're testing red vs. green. After 1,000 users, red is winning 6.2% vs. 5.1%.

"Let's stop and implement red!"

But if you wait for your full sample size, it might even out to 5.3% vs. 5.2% (much smaller difference, not significant).

Rule: Run until you hit your target sample size. Don't peek.

❌ Mistake 3: Testing Too Many Things

You test: button color, button text, button size, background color, CTA copy.

With 6 tests, 1 will show "success" by chance alone (5% false positive rate × 6 tests).

Rule: Test one thing at a time.

❌ Mistake 4: Confusing Correlation with Causation

You run a test over the holidays. Red button wins. You assume red is better.

But maybe people are more likely to buy during the holidays regardless of button color.

Rule: Control for confounding variables. Don't test during unusual periods.

❌ Mistake 5: Ignoring Secondary Metrics

Your red button increases clicks (primary metric) but users who click it have a 5% lower signup rate (secondary metric).

The test might "win" on clicks but fail on the outcome that matters (signups).

Rule: Always monitor secondary metrics. If primary wins but secondary loses, be suspicious.

Types of A/B Tests

Conversion Rate Tests

Optimize: Click-through rate, sign-up rate, purchase rate

Example: "Does a shorter form increase sign-ups?"

Retention Tests

Optimize: Churn rate, return rate

Example: "Does highlighting the benefits increase retention?"

Monetization Tests

Optimize: Revenue, average contract value

Example: "Does higher pricing maintain conversion but increase revenue?"

Time-to-Event Tests

Optimize: How fast people complete actions

Example: "Does simplified onboarding reduce time-to-first-purchase?"

The Bottom Line

A/B testing is how you prove that changes actually help your business.

To run valid tests:

Clear hypothesis
Correct sample size
Random split (control + treatment)
Run to completion
Check statistical significance
Implement if significant

Small wins compound. If every test improves your metric by 5%, after 10 tests you're 50% better.

Start testing.