← Back to Blog

The Analyst's Guide to A/B Testing: From Hypothesis to Decision

Published on: October 22, 2025


A/B testing is the gold standard for causal inference in product and marketing analytics. Yet most teams run it wrong — peeking at results early, under-sizing samples, or calling statistical significance a business win without checking effect size. This guide covers how to do it right, from framing the hypothesis to making the final call.


Step 1: Write a Falsifiable Hypothesis

A good hypothesis has three components: a change, a metric, and a direction. For example:

"Showing a progress bar during checkout will increase the checkout completion rate by at least 3% relative to the current baseline."

Vague hypotheses like "the new design will perform better" don't tell you what to measure, what threshold defines success, or when to stop. Be precise upfront — it prevents HARKing (Hypothesising After Results are Known) later.


Step 2: Calculate Your Required Sample Size

The two biggest mistakes in A/B testing are running tests too short and stopping them the moment results look good. Both are solved by calculating sample size before the test begins. You need four inputs:

from statsmodels.stats.power import zt_ind_solve_power
import numpy as np

baseline    = 0.12   # 12% conversion rate
mde         = 0.015  # Detect a 1.5pp lift or more
alpha       = 0.05
power       = 0.80

effect_size = (baseline + mde - baseline) / np.sqrt(
    baseline * (1 - baseline)
)

n_per_variant = zt_ind_solve_power(
    effect_size=effect_size,
    alpha=alpha,
    power=power,
    alternative='two-sided'
)

print(f"Required sample per variant: {int(np.ceil(n_per_variant))}")

Step 3: Run the Test Correctly

A few non-negotiables once the test is live:


Step 4: Interpret Results Correctly

A p-value below 0.05 tells you the result is unlikely under the null hypothesis. It does not tell you the effect is practically meaningful. Always report:

from scipy import stats

# Variant results
ctrl_conv,  ctrl_n  = 1180, 10000   # 11.8% conversion
test_conv,  test_n  = 1340, 10000   # 13.4% conversion

p_ctrl = ctrl_conv / ctrl_n
p_test = test_conv / test_n

# Two-proportion z-test
count  = [test_conv, ctrl_conv]
nobs   = [test_n, ctrl_n]
z, p   = stats.proportions_ztest(count, nobs)

lift   = (p_test - p_ctrl) / p_ctrl * 100
print(f"Relative lift: {lift:.1f}%")
print(f"p-value: {p:.4f}")
print(f"Significant: {p < 0.05}")

Common Mistakes to Avoid


Conclusion

A/B testing done well is one of the most powerful tools in an analyst's toolkit. The discipline it imposes — pre-registered hypotheses, power calculations, clean randomisation — forces rigorous thinking that makes every product decision more defensible. The teams that consistently do it right build a compounding informational advantage over those that just chase p-values.