Did my A/B test actually work?

Step 1

Your test data

Drop in the numbers from your test. We'll do the rest.

AYour original
Conv. rate
2.00%
BYour variation
Conv. rate
3.50%
Verdict

Real winner

Version B's conversion rate (3.50%) was 75.0% higher than Version A's (2.00%). At 95% sure, the win is real, not just luck.

98.0% chance Version B is better
95% target
What to do next

Go with Version B. At 95% sure, the win is real.

See what you'd earn
Launch your next A/B test for free
The money

What's this worth?

Tell us your traffic and what each conversion is worth. We'll show the money.

Your numbers
How far ahead?
If you go with Version B
+€138,253
over 6 months·98.0% chance
Extra revenue over time
+€138,253now1 mo2 mo3 mo4 mo5 mo6 mo
That's about 1,383 extra conversions on your next 90,000 visitors at €100 each.
If Version B turns out not to be better (2.0% chance): −€24,146

Projection over the next 6 months at 15,000 visitors/month and €100 per conversion. It's a best guess for going forward, not a summary of the test you ran.

Under the hood

How sure are we?

Test numbers come with wiggle room. Each curve below shows where each version's real conversion rate could land. If the curves barely overlap, you've got a clear winner. If they sit on top of each other, it's a coin flip.

Version A · 2.00%Version B · 3.50%
0.00%1.53%3.06%4.59%6.12%AB

Each curve covers where that version's real conversion rate could be. The shared middle is the "could be a tie" zone. The less the curves overlap, the surer we are one is really winning.

Planning a new test instead?Plan it with the sample size calculator

Four numbers. Visitors and conversions for your original, visitors and conversions for your variation. The calculator above turns those into a clear verdict, the chance Version B is the real winner, and the money at stake.

Change anything and the URL updates. Copy the share link and your boss sees the same numbers you do.

How to use this calculator

  1. Type your visitor and conversion counts into the Your test data card.
  2. Read the verdict. Green means you have a real winner. Grey means hold off, you don’t have enough data yet. Red means stick with your original.
  3. Below the verdict, see the chance Version B is really better. If the bar passes the target line, you’re good.
  4. Open Optional: test settings if you want to change how sure you need to be, or tell us how many days the test ran.
  5. Scroll to The money and add your traffic plus what each conversion is worth. The calculator turns it into euros.
  6. Hit Copy share link to send the whole thing to your boss. The link saves everything.

How we calculate this

This tool uses Bayesian statistics. That’s the same math we use inside the Kirro product. The short version: we work out the chance Version B is genuinely better than Version A, given the data you have.

The chance Version B wins (the headline number)

Every conversion rate has wiggle room. With 1,000 visitors and 20 conversions, the real rate could be anywhere from about 1.2% to 2.9%. With 10,000 visitors, that range gets tighter. We turn both versions into bell curves, then ask: pick a random “real” rate from each, how often does B beat A?

That’s the percentage on the result card. The math:

P(B > A) = Φ((rateB − rateA) / √(seA² + seB²))

Φ is the standard normal cumulative distribution. seA and seB are the standard errors of each rate. With enough data this approximation is solid.

When that percentage clears your target (95% by default), the verdict says Real winner. Between 85% and your target, you’ll see Almost there. Below that, Too close to call.

Why “Clear loss” can show up

If Version B is the worse one and you’re sure about it, the verdict goes red and says Clear loss. That’s a real outcome too. You ruled out a bad design. Keep your original and try something more different next time.

The money section

When you add your monthly traffic and what each conversion is worth, the calculator turns the test into euros. It shows the most likely outcome as a big number, with the chance it actually plays out. Underneath, you see the other side of the bet.

The upside math: if Version B really keeps winning, the per-visitor lift times your monthly traffic times the value of a conversion, summed across your time frame. The downside math is the same shape, just for the case where B actually isn’t winning.

It’s a best guess, not a promise. The point is to put a euro on the test so you can decide if the upside is worth the risk.

What to do with your results

Real winner (green). Go with Version B. The change is doing real work. Use the money number to show your boss what you got back for the time. Your original isn’t dead, you just found something better.

Almost there (yellow). Two choices. Keep the test running until you clear your target (the tips block tells you roughly how many more days). Or go with B now and take the small chance you’d be wrong. Cheap-to-undo changes (button copy, headline) usually deserve the now call. Big stuff (full redesign) deserves the wait.

Too close to call (grey). Keep running, or test a bigger change. Two versions that look identical at 50,000 visitors won’t suddenly diverge at 100,000. The minimum detectable effect guide explains how to size the lift you’re hunting for. The sample size calculator tells you how many visitors that needs.

Heading the wrong way (orange). Version A is pulling ahead. Keep the test running, or stop and try a bigger change next time.

Clear loss (red). Your variation made things worse, and the gap is real. You learned what not to do. Read common A/B testing mistakes and try a more dramatic change.

FAQ

What does “real winner” mean?

It means the gap between your two versions isn’t a fluke. At 95% sure, there’s only a 5% chance the gap you see comes from luck. It does not mean the gap is big or worth shipping. A tiny lift can be a “real” winner if you have huge traffic. Always look at the lift and the money, not just the verdict.

How sure should I be?

95% is the standard and works for most tests. Pick 99% if the change is hard to undo (full page redesign, checkout flow). 90% is fine for quick reads where being wrong is cheap (a button-copy tweak, a small headline change). Higher means more data, so the trade-off is always: how sure do I need to be vs how fast do I want an answer.

My test hit 95% then dropped back. What happened?

You peeked. A/B test numbers bounce around during the run, especially in the first few thousand visitors. The whole point of setting a target is to call the test once, at the end of a sample size you picked up front. Reading partway through and reacting is called peeking, and it makes you wrong more often. Calculate your sample size up front and wait.

Can a winner not be worth shipping?

Yes, and this is the bigger trap than people think. The verdict tells you the gap is real. It doesn’t tell you it’s worth your time. A 0.1 percentage-point lift can be “real” with enough traffic, but if your monthly revenue moves by €40, the engineering time wasn’t worth it. That’s what The money section is for. Put a euro on it, then decide.

How many visitors do I need before checking?

Calculate the sample size up front and don’t check until you hit it. Rough sanity check: under about 100 conversions per version, the headline number is jumpy. Don’t trust strong-looking verdicts under that. If you’ve already started without planning, finish to a clean two-week window (a full business cycle) before reading.

What if my variation is worse?

Same math, different verdict. The calculator shows Version A as the winner and the result card goes red. That’s useful information, not a failure. You ruled out a worse design. Keep the original and try something more different next time.

Got a winner? Go with it. Stuck? Kirro figures out what to test next, writes the change for you, and runs the test. From “should I go with this?” to “what’s next?”, all in one tool.

Launch your A/B test for free

Set it up in 3 minutes. No code, no developer.

Everything. No limits. No surprises.

Get started