Four numbers. Visitors and conversions for your original, visitors and conversions for your variation. The calculator above turns those into a clear verdict, the chance Version B is the real winner, and the money at stake.
Change anything and the URL updates. Copy the share link and your boss sees the same numbers you do.
How to use this calculator
- Type your visitor and conversion counts into the Your test data card.
- Read the verdict. Green means you have a real winner. Grey means hold off, you don’t have enough data yet. Red means stick with your original.
- Below the verdict, see the chance Version B is really better. If the bar passes the target line, you’re good.
- Open Optional: test settings if you want to change how sure you need to be, or tell us how many days the test ran.
- Scroll to The money and add your traffic plus what each conversion is worth. The calculator turns it into euros.
- Hit Copy share link to send the whole thing to your boss. The link saves everything.
How we calculate this
This tool uses Bayesian statistics. That’s the same math we use inside the Kirro product. The short version: we work out the chance Version B is genuinely better than Version A, given the data you have.
The chance Version B wins (the headline number)
Every conversion rate has wiggle room. With 1,000 visitors and 20 conversions, the real rate could be anywhere from about 1.2% to 2.9%. With 10,000 visitors, that range gets tighter. We turn both versions into bell curves, then ask: pick a random “real” rate from each, how often does B beat A?
That’s the percentage on the result card. The math:
P(B > A) = Φ((rateB − rateA) / √(seA² + seB²))
Φ is the standard normal cumulative distribution. seA and seB are the standard errors of each rate. With enough data this approximation is solid.
When that percentage clears your target (95% by default), the verdict says Real winner. Between 85% and your target, you’ll see Almost there. Below that, Too close to call.
Why “Clear loss” can show up
If Version B is the worse one and you’re sure about it, the verdict goes red and says Clear loss. That’s a real outcome too. You ruled out a bad design. Keep your original and try something more different next time.
The money section
When you add your monthly traffic and what each conversion is worth, the calculator turns the test into euros. It shows the most likely outcome as a big number, with the chance it actually plays out. Underneath, you see the other side of the bet.
The upside math: if Version B really keeps winning, the per-visitor lift times your monthly traffic times the value of a conversion, summed across your time frame. The downside math is the same shape, just for the case where B actually isn’t winning.
It’s a best guess, not a promise. The point is to put a euro on the test so you can decide if the upside is worth the risk.
What to do with your results
Real winner (green). Go with Version B. The change is doing real work. Use the money number to show your boss what you got back for the time. Your original isn’t dead, you just found something better.
Almost there (yellow). Two choices. Keep the test running until you clear your target (the tips block tells you roughly how many more days). Or go with B now and take the small chance you’d be wrong. Cheap-to-undo changes (button copy, headline) usually deserve the now call. Big stuff (full redesign) deserves the wait.
Too close to call (grey). Keep running, or test a bigger change. Two versions that look identical at 50,000 visitors won’t suddenly diverge at 100,000. The minimum detectable effect guide explains how to size the lift you’re hunting for. The sample size calculator tells you how many visitors that needs.
Heading the wrong way (orange). Version A is pulling ahead. Keep the test running, or stop and try a bigger change next time.
Clear loss (red). Your variation made things worse, and the gap is real. You learned what not to do. Read common A/B testing mistakes and try a more dramatic change.
FAQ
What does “real winner” mean?
It means the gap between your two versions isn’t a fluke. At 95% sure, there’s only a 5% chance the gap you see comes from luck. It does not mean the gap is big or worth shipping. A tiny lift can be a “real” winner if you have huge traffic. Always look at the lift and the money, not just the verdict.
How sure should I be?
95% is the standard and works for most tests. Pick 99% if the change is hard to undo (full page redesign, checkout flow). 90% is fine for quick reads where being wrong is cheap (a button-copy tweak, a small headline change). Higher means more data, so the trade-off is always: how sure do I need to be vs how fast do I want an answer.
My test hit 95% then dropped back. What happened?
You peeked. A/B test numbers bounce around during the run, especially in the first few thousand visitors. The whole point of setting a target is to call the test once, at the end of a sample size you picked up front. Reading partway through and reacting is called peeking, and it makes you wrong more often. Calculate your sample size up front and wait.
Can a winner not be worth shipping?
Yes, and this is the bigger trap than people think. The verdict tells you the gap is real. It doesn’t tell you it’s worth your time. A 0.1 percentage-point lift can be “real” with enough traffic, but if your monthly revenue moves by €40, the engineering time wasn’t worth it. That’s what The money section is for. Put a euro on it, then decide.
How many visitors do I need before checking?
Calculate the sample size up front and don’t check until you hit it. Rough sanity check: under about 100 conversions per version, the headline number is jumpy. Don’t trust strong-looking verdicts under that. If you’ve already started without planning, finish to a clean two-week window (a full business cycle) before reading.
What if my variation is worse?
Same math, different verdict. The calculator shows Version A as the winner and the result card goes red. That’s useful information, not a failure. You ruled out a worse design. Keep the original and try something more different next time.
Got a winner? Go with it. Stuck? Kirro figures out what to test next, writes the change for you, and runs the test. From “should I go with this?” to “what’s next?”, all in one tool.