How many visitors does my A/B test need?

1
Your current conversion rate
The percentage that converts today on the page you're testing.
2
Your monthly traffic
How many visitors hit this page in a typical month.
3
The smallest change worth finding
How big a winner do you want to find?
A +20% change means going from 2.5% to 3%.
You'll need
16,792
visitors per variation

About 16,792 visitors per variation (33,584 total). That's ~68 days at your traffic. Most small sites can't wait that long.

Takes ~68 days with your traffic
68 days
2 wks
4 wks
6 wks
8 wks
10 wks
12 wks

Drag to change duration. The smallest lift adjusts to fit.

What to do next

Too long for most sites. Try a bigger swing, a higher-traffic page, or accept a smaller lift.

Launch your A/B test for free
Working backwards

How long should you actually run this?

Pick a duration. See the conversion rate the new version would need to reach for the test to call it a win. When calendar time is your real constraint, this is the number that matters.

If you run forThe new version would need to reachThat's a lift ofVisitors collected
3 days2.50%5.30%+112%1,500
7 days2.50%4.20%+68%3,500
14 days2.50%3.66%+46%7,000
21 days2.50%3.43%+37%10,500
28 days2.50%3.29%+32%14,000
56 days2.50%3.05%+22%28,000
68 daysyour plan2.50%3.00%+20%34,000
Read it like this: at ~500 visitors/day and a 2.5% baseline, running 14 days only lets you confirm a winner if it reaches 3.66% or higher. Anything smaller, you can't tell if it's real or just random luck.

Three questions. What’s your baseline, how much traffic do you get, how big a winner do you want to find. The calculator above turns those into visitors per version, an estimated number of calendar days, and a “working backwards” table so you can see what’s realistic at your traffic.

If you change anything, the URL updates. Copy the share link and the assumptions travel with the number.

How to use this calculator

  1. Enter your current conversion rate. Pull it from GA4 or whatever you use. Most sites land between 1% and 5%.
  2. Enter your monthly visitors to the page you’re testing. Both versions combined.
  3. Set the smallest change worth finding. If you don’t have a target in mind, use the benchmark picker. It suggests a lift based on where median and top-quartile competitors land.
  4. Read the green panel. The big number is visitors per version. The “~N days at your traffic” stat below it is the one that actually matters.
  5. Scroll to the Working backwards table to see the trade-off. At your traffic, a 7-day test could only call a winner if the new version reaches X%. 28 days lets you catch smaller wins.

How the math works

The formula

This is a two-proportion z-test, two-tailed, with equal allocation (50/50 split). Same test Optimizely, VWO, and Evan Miller’s calculator use under the hood. The formula:

n = ( z_{α/2}·√(2·p̄·(1−p̄)) + z_{β}·√(p₁(1−p₁) + p₂(1−p₂)) )² / (p₂ − p₁)²

Where p₁ is your baseline rate, p₂ is the target rate, and p̄ is their average. Result n is visitors per variation.

What “relative lift” means here

You enter the lift you’d like to detect as a relative percentage. Same way Optimizely and most marketers think about it. A +20% lift on a 2% baseline means the new version needs to hit 2.4%. A +50% lift means 3%.

This framing matters because it scales with your baseline. Asking to detect “a 1pp jump” from 2% to 3% is mathematically the same as a +50% relative lift, just stated differently. Relative numbers travel better across teams and funnels. For a deeper breakdown, see our guide to minimum detectable effect.

The four levers

  • Baseline conversion rate. Lower baselines need more visitors. Less signal in each one.
  • Minimum detectable lift. The smaller the lift you want to catch, the more visitors. Sample size scales with 1/lift². Halve the lift and you 4× the visitors. Our sample size formula explainer walks through the math.
  • Confidence (1 − α). How sure you want to be that a “winner” isn’t noise. 95% is standard. Lower it for faster reads when the cost of a wrong call is low. See Type I vs Type II errors for the full picture.
  • Power (1 − β). How often you’ll catch a real win. 80% is standard. Push to 90% if missing a real winner is expensive. The statistical power explainer goes deeper.

How “working backwards” works

The duration table inverts the formula. Given a fixed visitor budget (your monthly traffic prorated to N days, split A/B), it solves for the smallest relative lift the test could detect, and shows what conversion rate the new version would need to reach. That’s the most actionable number when calendar time is the real constraint.

Edge cases handled honestly

  • Very low baselines (< 1%). The z-test normal approximation is shakier here. Numbers are still in the right ballpark. Round up generously.
  • Lifts that push the target past 100%. We flag it. A +5000% lift on a 2% baseline implies a 102% target, which is mathematically impossible.
  • Negative lifts. Testing a change you expect to drop conversion (e.g. removing a CTA to test downstream impact) is the same math.
  • One-tailed tests. Cut sample by ~20%. Don’t bother. You rarely know the direction in advance, and the cost of being wrong is high.

What this calculator deliberately doesn’t do

  • Sequential / “peeking” math. Fixed-horizon p-values inflate false positives when you peek. Use sequential testing (mSPRT) or Bayesian instead. Kirro does sequential by default.
  • Multi-armed tests. Three or more variants needs multiple-comparison correction (Bonferroni, Šidák, FDR).
  • Non-binary metrics. Revenue, time-on-page, clicks-per-user. Those need a t-test or rank test on the distribution, not a z-test on a proportion.

About the benchmarks

The industry / funnel benchmarks shipped in this calculator are illustrative placeholders, not sourced research. They give you a sensible starting point if you have no idea what to aim for, but you should replace them with numbers from your own analytics or industry reports before quoting them in a board deck.

FAQ

What’s a good baseline conversion rate?

Whatever yours is. Don’t aim for a benchmark when measuring your own baseline. Use your real number from analytics. Pull the last 30 days from GA4 or similar. If you don’t have 30 days of data, you don’t have enough to A/B test reliably. Come back later.

How do I pick a minimum detectable lift?

Start with the industry/funnel benchmark. It tells you the gap between your baseline and a realistic target for similar businesses. If you’re at 2% and the median is 4%, that’s a +100% lift. Aiming for that means you’d confirm any winner that gets you most of the way there. Sanity-check it against your roadmap: if a +10% lift wouldn’t change what you ship next, set the bar higher and test a bolder change.

Why does asking for a smaller lift balloon the sample size?

Sample size scales with 1/lift². It’s quadratic. Going from a +20% to a +10% lift roughly quadruples the visitors needed. That’s why honest testing tools push you toward bigger, bolder swings. This is one of the most common A/B testing mistakes: setting an MDE so small that the test can never finish.

Can I stop the test early if I see significance?

Not with a fixed-horizon test, which is what this calculator powers. Peeking inflates your false-positive rate from 5% to as much as 30%. If you want to peek, use sequential testing (mSPRT) or a Bayesian tool. Kirro does sequential by default, so you can.

What if my traffic is too low for any reasonable lift?

Honest answer: you may not be a candidate for A/B testing on the metric you’ve picked. Try testing a higher-funnel metric (clicks instead of purchases), or commit to bigger swings. Redesigns, not button colors. The duration table on this page shows the trade-off directly. Lower-traffic sites need to aim for bigger lifts or accept longer test windows.

Does this work for revenue tests?

No. Revenue is continuous and skewed, so a proportion-based z-test doesn’t apply. You’d want a Welch’s t-test or Mann–Whitney U on per-visitor revenue. Most marketing teams test conversion rate (binary) and read revenue as a secondary metric.

Is the math the same as Evan Miller’s calculator?

Yes. Two-proportion z-test, two-tailed, equal allocation. Same z-tables, same formula. Same answer to within rounding.

Are the benchmark numbers reliable?

They’re illustrative placeholders to help you start. Treat them as “roughly the right shape” for each industry, not as research you can quote. Replace them with numbers from your own analytics or a recent industry report before using them in a deck.

Why share a calculator with URL params?

Because when someone says “you need 13,000 visitors per variation,” the next thing you ask is “based on what?” A shareable link encodes the inputs in the URL so the assumptions are right there in the receipts.

Ready to run this test? Set it up in Kirro in about three minutes. No code, no developer.

Launch your A/B test for free

Set it up in 3 minutes. No code, no developer.

Everything. No limits. No surprises.

Get started