What is statistical significance in an A/B test?
The A/B test significance calculator above answers the question that decides every experiment: is the difference between your two variants real, or could it just be random chance? When variant B converts better than variant A, you need to know whether that gap reflects a genuine improvement or simply the noise that appears whenever you measure two samples. Statistical significance is the formal way of separating signal from noise.
The standard convention is 95% confidence — equivalently, a p-value below 0.05. That means there is less than a 5% probability the observed difference happened by luck if the two variants were truly identical. Reaching this threshold is what lets you confidently ship the winner instead of fooling yourself with a result that won’t hold up.
How the calculation works
This calculator runs a two-sided two-proportion z-test, the standard test for comparing two conversion rates:
1. Conversion rate A = Conversions_A / Visitors_A
2. Conversion rate B = Conversions_B / Visitors_B
3. Pooled rate p = (Conversions_A + Conversions_B) / (Visitors_A + Visitors_B)
4. Standard error = √( p × (1 − p) × (1/Visitors_A + 1/Visitors_B) )
5. z-score = (Rate_B − Rate_A) / Standard error
6. p-value = 2 × (1 − Φ(|z|)) ← Φ is the standard normal CDF
7. Confidence = (1 − p-value) × 100
- Visitors — the number of people in each variant.
- Conversions — how many of them converted in each variant.
The pooled standard error estimates how much random variation to expect if the variants were identical. The z-score expresses the observed difference in units of that expected variation, and the p-value translates the z-score into a probability. Everything runs instantly in your browser — no data leaves your device.
Worked example
Variant A: 10,000 visitors, 500 conversions (5.00%). Variant B: 10,000 visitors, 575 conversions (5.75%).
Rate A = 5.00%, Rate B = 5.75%
Pooled p = 1,075 / 20,000 = 5.375%
Standard error = √(0.05375 × 0.94625 × (1/10,000 + 1/10,000)) ≈ 0.00319
z = (0.0575 − 0.0500) / 0.00319 ≈ 2.35
p-value ≈ 0.019 → Confidence ≈ 98.1%
The relative uplift is 15% (5.75% vs 5.00%), and at 98.1% confidence the result clears the 95% threshold — this is a statistically significant win. You can ship variant B.
Now imagine the same rates but with only 1,000 visitors each instead of 10,000. The standard error grows roughly threefold, the z-score shrinks below 1, and confidence drops well under 95% — the same observed difference is no longer significant, because the sample is too small to rule out chance. Sample size is everything.
Benchmarks and rules of thumb
- 95% confidence (p < 0.05) is the standard bar for declaring a winner.
- 99% confidence is appropriate for high-stakes or hard-to-reverse changes.
- Below 90% the result should be treated as inconclusive, not as a loss.
A non-significant result does not mean “no difference” — it means you don’t yet have enough evidence to conclude there is one. The fix is usually more sample, not a different conclusion.
How to interpret and avoid common mistakes
The most common A/B testing error is peeking and stopping early. If you check the test repeatedly and stop the moment it crosses 95%, you dramatically inflate your false-positive rate — random fluctuations will cross the line temporarily even when there’s no real effect. Decide your sample size in advance and let the test run to completion.
The second mistake is ignoring practical significance. A result can be statistically significant but too small to matter — a 0.1% lift that’s “real” may not be worth the engineering cost to ship. Always read the uplift alongside the confidence: significance tells you the effect is real, the uplift tells you whether it’s worth caring about.
Finally, remember this calculator assumes a simple two-variant test with a single conversion goal and independent visitors. Multi-variant tests, repeated exposures, or segmented analyses need adjustments (such as corrections for multiple comparisons) that a basic z-test doesn’t apply.
Frequently asked questions
What does statistical significance mean in an A/B test? It is the probability that the observed difference between two variants is real rather than random noise. A common threshold is 95% confidence (p-value below 0.05), meaning there is less than a 5% chance the result happened by luck.
How is the confidence calculated here? The calculator runs a two-sided two-proportion z-test. It pools the conversion rates to estimate the standard error, computes a z-score for the observed difference, and converts that to a p-value using a normal distribution approximation. Confidence equals (1 − p-value) × 100.