A/B testing template (5 free templates)

You need an A/B testing template. Not a 30-page whitepaper about experimentation culture. Just a simple document that keeps your test organized from start to finish.

Here are five templates that do exactly that. Each one covers a different stage of the testing process. Planning, hypotheses, tracking, results, and what you learned. Copy whichever ones you need. They’re free, they work in Google Sheets or Excel, and they’ll save you from the “wait, what were we testing again?” conversation three weeks into a test.

Half of all experimentation teams don’t have a central place to store their test documents (Speero + Kameleoon, 2024). That’s not a minor process gap. Teams with standardized documentation are 69% more likely to see significant growth. A simple template is the difference between “we tested something once” and “we have a testing program.”

What every A/B testing template needs

Eight fields. That’s it. Every good A/B testing template captures the same core information.

Before you pick a format (spreadsheet, Notion, napkin), get these eight fields right. Skip any of them and you’ll regret it when someone asks “so what did we learn?” four months later.

These eight fields actually matter:

Field	What it captures	Why you need it
Test name	A short, scannable label	So you can find it later without scrolling through “Test 47”
Hypothesis	What you think will happen and why	Forces you to think before you test
Primary metric	The one number that decides if you won	Prevents cherry-picking a friendly metric after the fact
Audience	Who sees the test	Stops you from accidentally testing on the wrong group
Control group	The unchanged version visitors are measured against	Define your control group before launch
Traffic split	What percentage of visitors see each version	Usually 50/50, but worth writing down
Duration	How long the test will run	Prevents premature peeking (more on this later)
Success criteria	What “winning” looks like, in numbers	The most skipped field. Also the most important.
What you learned	What you discovered, win or lose	Where compound knowledge gets built

Most templates nail the first six and skip the last two. That’s a problem. Success criteria and what you learned are where the real value lives. They’re what separates a testing program from a series of random guesses.

87% of mature experimentation programs standardize their test documentation. If you’re not doing this yet, you’re in the majority. But you’re not in the majority that grows.

Our take: The template format doesn’t matter. Google Sheets, Notion, a text file, whatever. What matters is that you actually fill it in before you launch the test. A half-finished template in Google Sheets beats a beautiful Notion database that nobody updates.

If you need a refresher on what makes a good test, check out our guide to A/B testing best practices. But for now, let’s look at each template.

A/B test plan template

A test plan is your pre-flight checklist. Fill it in before you launch anything.

This is the most important template. It’s the one you fill out before you touch any testing tool. Think of it like a recipe: you wouldn’t start cooking without knowing the ingredients.

Here’s what a complete A/B testing plan template looks like, filled in with a real example:

Field	Example
Test name	Homepage CTA copy test
Objective	Increase free trial signups
Page	Homepage (kirro.io)
Hypothesis	If we change the CTA from “Get Started” to “Start My Free Trial,” then trial signups will increase by at least 10%, because first-person possessive language makes the action feel personal
Primary metric	Free trial signup rate
Secondary metrics	Click-through rate on CTA, bounce rate
Audience	All desktop visitors, excluding logged-in accounts
Traffic split	50% Version A (original) / 50% Version B
Estimated duration	3 weeks (based on sample size calculation)
Success criteria	Version B beats Version A by at least 10% with 95% confidence
Launch date	2026-06-15

Notice the success criteria line. That’s the field most people skip, and it’s the one that matters most.

Lock in your criteria before you see the results

Scientists call this “pre-registration.” Fancy word, simple idea: write down what counts as a win before you start.

Why? Because once you see the data, your brain will find a way to declare victory. Version B didn’t beat Version A on signups? Well, it did improve time-on-page. That counts, right?

No. It doesn’t. Not unless you said it would beforehand.

Research from Stanford and IZA found that a hypothesis field alone doesn’t prevent cherry-picking results. You need a complete plan: the exact metric, the minimum improvement you’re looking for (your minimum detectable effect), and the confidence level. Write it down. Lock it in.

This is easier than it sounds. Just answer three questions before you launch:

Which metric decides the winner?
How big does the improvement need to be?
How confident do you need to be in the result?

Write those answers in your plan template. Done. You’ve just pre-registered your test.

Tools like Kirro handle traffic allocation, test duration, and confidence calculations automatically, so you can skip the spreadsheet entirely. But if you’re starting with templates, that’s fine too.

When you’re ready to put this plan into practice, our guide on designing a marketing experiment walks through the full process.

A/B test hypothesis template

A good hypothesis follows one format: If [change], then [outcome], because [reason].

The hypothesis is where most tests go wrong. People write incomplete ones. “I think a green button will convert better” is a guess, not a hypothesis. A hypothesis has a structure.

Use the If/Then/Because format:

If [we make this specific change], then [this specific metric will improve by this much], because [this is the rationale based on evidence or research].

The “because” is the part people skip. It’s also the part that makes the difference. Without it, you’re just running random tests. With it, even a losing test teaches you something, because you can evaluate whether your reasoning was wrong or your execution was off.

Four examples across different test types:

Headline test: If we change the homepage headline from “Welcome to our platform” to “Stop guessing. Start testing,” then bounce rate will decrease by 15%, because action-oriented language creates urgency and relevance.

CTA test: If we change the pricing page CTA from “Sign Up” to “Start Testing Free,” then click-through rate will increase by 20%, because benefit-focused CTAs outperform generic ones.

Layout test: If we move testimonials above the fold on the landing page, then form submissions will increase by 10%, because social proof near the primary action reduces hesitation.

Pricing page test: If we add a “most popular” badge to the mid-tier plan, then mid-tier plan selections will increase by 25%, because anchoring a recommendation reduces decision fatigue on pricing pages.

Common hypothesis mistakes

Three mistakes kill more hypotheses than bad ideas:

No “because.” “If we change X, then Y will improve” is a prediction, not a hypothesis. The reasoning is what makes it testable and teachable.
Unmeasurable outcomes. “Users will like it more” isn’t measurable. Tie every hypothesis to a number you can actually track. Need help picking the right one? Here’s our guide to A/B testing metrics.
Too many variables. “If we change the headline, the image, and the CTA…” that’s not a test. That’s a redesign. If you want to change multiple things at once, you need multivariate testing.

Want to build hypotheses faster? Our hypothesis generator tool walks you through the If/Then/Because format step by step.

A/B testing Excel template

A spreadsheet tracker works great when you’re running 1 to 5 tests per month. Beyond that, it starts to creak.

Google Sheets and Excel are where most testing programs start. And honestly? They work fine early on. You don’t need a fancy tool to track three tests.

A basic A/B testing Excel tracker needs two tabs: one for active tests, one for your test backlog.

Tab 1: Test tracker

Test ID	Page	Hypothesis	Start date	End date	Status	Primary metric	Baseline	Result	Winner	Learning
T-001	Homepage	If we change…	Jun 15	Jul 6	Complete	Signup rate	3.2%	4.1%	Version B	Personal CTAs outperform generic
T-002	Pricing	If we add…	Jun 20	…	Running	Plan selection	45%	…	…	…

Tab 2: Test backlog with ICE scoring

Your backlog is where test ideas wait their turn. But which one should you run first? That’s where a prioritization framework helps.

ICE scoring is the simplest approach. Rate each test idea from 1 to 10 on three criteria:

Impact: If this test wins, how much will it move the needle?
Confidence: How sure are you it’ll actually win? (Be honest.)
Ease: How easy is it to build and launch?

Average the three scores. Run the highest-scoring ideas first.

Test idea	Impact (1-10)	Confidence (1-10)	Ease (1-10)	ICE score	Status
Change homepage headline	8	7	9	8.0	Next up
Redesign checkout flow	9	5	3	5.7	Backlog
Add trust badges	6	6	8	6.7	Backlog

ICE isn’t perfect. It’s subjective, and the same idea can score differently depending on who’s scoring. But it’s fast, it’s better than gut feeling, and it gets you started.

Want something more rigorous? The PXL framework (created by Peep Laja at CXL) uses yes/no questions instead of subjective scores. “Is the change above the fold?” “Is it based on user testing data?” Binary answers leave less room for opinion. Better for mature programs, but overkill when you’re just getting started.

Our take: Use ICE until you’re running 10+ tests per month. Then look at PXL. Frameworks are like training wheels: they’re helpful early on, and you’ll know when you’ve outgrown them.

When A/B testing in Excel stops working

Spreadsheets break down in predictable ways:

Version control. Someone edits the wrong row. Nobody knows which version is current. The “FINAL_v3_ACTUALLY_FINAL” filename makes an appearance.
No collaboration controls. Three people with write access means three people who can accidentally overwrite each other’s notes.
Reporting is manual. Want to see all tests you ran on the checkout page last quarter? Get ready to build a pivot table. Or three.
Institutional knowledge walks out the door. When your testing lead leaves, their personal spreadsheet goes with them.

These aren’t hypothetical problems. PractiTest research found that traceability between tests and the research that motivated them becomes “nearly impossible” in spreadsheets as programs grow.

If you’re nodding along, you might be ready to move beyond spreadsheets. We’ll get to that in a minute.

Test results template (the part everyone skips)

Most templates stop at “results.” The real value is in what you learned and what you’ll test next.

Almost every template gets this part wrong. They have a column for “result” and maybe “winner.” That’s it. Test over, move on.

The numbers back this up. Only 36.3% of A/B tests produce a clear winner. Another 41.6% are inconclusive. If you only document the winners, you’re throwing away most of your data.

Stefan Thomke, a professor at Harvard Business School, puts it bluntly: “For every online experiment that succeeds, nearly 10 don’t.” At Booking.com, they run 25,000 tests a year with roughly a 10% success rate. Those aren’t failures. They’re data. But only if you write down what you learned.

This results template captures the full picture:

Field	What to write
Test name	Same as your plan template
Dates run	Start and end date
Result	Winner, loser, or inconclusive
Primary metric: original	e.g., 3.2% signup rate
Primary metric: challenger	e.g., 4.1% signup rate
Improvement	+28% relative improvement
Confidence level	96% (or whatever your tool reports)
What happened	One-sentence summary of the result
Why we think it happened	Your interpretation, the reasoning
What we learned	The actual insight, applicable beyond this single test
What we’ll test next	The follow-up test this result suggests

The last three rows are where compound knowledge gets built. A test that says “Version B won by 12%” is a data point. A test that says “personalizing CTAs consistently outperforms generic language across three separate tests” is an insight. One of those changes how you build landing pages forever.

Ronny Kohavi led experimentation at Microsoft, Airbnb, and Amazon. He recommends maintaining a searchable experiment repository so past tests inform future ones. Your spreadsheet results tab is the start of that repository.

And about those inconclusive results? Document them. Record why the test didn’t reach confidence (your conversion rate baseline, not enough traffic, test was too short). This prevents your team from re-running the same test six months later because nobody remembers you already tried it.

When to replace your template with a tool

Templates are great for getting started. They stop being great around test number 10.

Templates are training wheels. They teach you the right process. But there’s a point where the spreadsheet creates more work than it saves.

Signs you’ve outgrown yours:

You’re running 5+ tests per month and losing track of which ones are active
Multiple people need to update the same tracker (and they’re overwriting each other)
You’ve re-run a test you already ran because nobody could find the old results
Stakeholder reporting means manually copying data into a slide deck every month
Your testing lead left, and their institutional knowledge went with them

What does a dedicated A/B testing tool handle that spreadsheets can’t? Traffic splitting (sending the right visitors to each version). Real-time results. Automatic statistical calculations so you know whether to trust the numbers. And a visual editor so you can change a headline without writing code.

Only 13% of experimentation teams have well-integrated tool stacks. Most teams are duct-taping spreadsheets to their testing tool and manually copying results back and forth. If that sounds familiar, you’re not alone.

Kirro handles this by rolling the template into the tool. You set up a test, and the plan, hypothesis, traffic allocation, statistical analysis, and results all live in one place. No spreadsheet handoff. No copy-pasting. When a test finishes, everything is searchable. Six months later, you can actually find what you learned.

If you’re still in the early stages (1 to 3 tests per month), stick with templates. They’re free, they work, and they’ll teach you the process. When the spreadsheet starts slowing you down, try setting up your first test in Kirro. Three minutes, no code, and you’ll never manually calculate sample size again.

For a broader comparison of what’s available, check out our A/B testing tools roundup.

FAQ

Quick answers to the questions we hear most about A/B testing templates.

What format should my A/B testing template be in?

Google Sheets for most teams. It’s free, it’s collaborative, and everyone already knows how to use it. Use Excel if you need offline access or your company lives in Microsoft’s ecosystem. Notion works if your team already runs workflows there. The format matters way less than actually using the template consistently. A Google Sheet that gets filled in beats a custom Notion database that nobody touches.

How detailed should an A/B test document be?

Eight fields minimum (the ones listed above). You can add more detail for complex tests, but don’t let documentation slow you down. A five-minute template beats a 30-minute report that never gets filled out. Craig Sullivan, who’s run thousands of tests, describes it well: at 50+ tests per month, elaborate documentation is impractical. The goal is functional, not polished.

Should I use a template or an A/B testing tool?

Start with a template if you’re running fewer than 5 tests per month. A template teaches you the process, and the process matters more than the tool at this stage. Move to a tool when collaboration, tracking, or statistical analysis becomes the bottleneck. If you’re calculating sample sizes by hand and copying results between tabs, a tool will save you real time. Set up a free test in Kirro and see if it clicks for you.

What’s the best test prioritization framework?

ICE is the best starting point. It’s fast (score three criteria from 1 to 10), it’s widely understood, and it’s better than going with your gut. PXL is more rigorous because it uses true/false questions instead of subjective scores, which means less arguing in meetings. Start with ICE, switch to PXL when your team is running 10+ tests a month and wants less subjectivity in prioritization.

What should I do with inconclusive A/B test results?

Document them. 41.6% of tests are inconclusive, according to DRIP Agency’s analysis of 91 e-commerce brands. That’s normal, not a failure. Record what you tested, why it didn’t reach confidence, and whether the direction looked positive or negative. This prevents re-running the same test later and builds your team’s knowledge over time. Even inconclusive results teach you something: maybe the change was too small, the traffic was too low, or your original was already decent. That’s useful to know. Check our guide on common A/B testing mistakes for more on interpreting tricky results.

Can I test multiple things at once with these templates?

These templates are built for standard A/B tests where you change one thing at a time. Want to test a new headline plus a new image plus a new CTA all at once? That’s multivariate testing, and it needs more traffic and a different setup. For most teams, one change at a time teaches you more. Save the multivariate approach for high-traffic pages where you’ve already picked the obvious winners.

Randy Wattilete

CRO expert and founder with nearly a decade running conversion experiments for companies from early-stage startups to global brands. Built programs for Nestlé, felyx, and Storytel. Founder of Kirro (A/B testing).

View all author posts

A/B testing template: 5 ready-to-use templates for every stage of your test