You need an A/B testing template. Not a 30-page whitepaper about experimentation culture. Just a simple document that keeps your test organized from start to finish.
Here are five templates that do exactly that. Each one covers a different stage of the testing process. Planning, hypotheses, tracking, results, and what you learned. Copy whichever ones you need. They’re free, they work in Google Sheets or Excel, and they’ll save you from the “wait, what were we testing again?” conversation three weeks into a test.
Half of all experimentation teams don’t have a central place to store their test documents (Speero + Kameleoon, 2024). That’s not a minor process gap. Teams with standardized documentation are 69% more likely to see significant growth. A simple template is the difference between “we tested something once” and “we have a testing program.”
What every A/B testing template needs
Before you pick a format (spreadsheet, Notion, napkin), get these eight fields right. Skip any of them and you’ll regret it when someone asks “so what did we learn?” four months later.
These eight fields actually matter:
| Field | What it captures | Why you need it |
|---|---|---|
| Test name | A short, scannable label | So you can find it later without scrolling through “Test 47” |
| Hypothesis | What you think will happen and why | Forces you to think before you test |
| Primary metric | The one number that decides if you won | Prevents cherry-picking a friendly metric after the fact |
| Audience | Who sees the test | Stops you from accidentally testing on the wrong group |
| Control group | The unchanged version visitors are measured against | Define your control group before launch |
| Traffic split | What percentage of visitors see each version | Usually 50/50, but worth writing down |
| Duration | How long the test will run | Prevents premature peeking (more on this later) |
| Success criteria | What “winning” looks like, in numbers | The most skipped field. Also the most important. |
| What you learned | What you discovered, win or lose | Where compound knowledge gets built |
Most templates nail the first six and skip the last two. That’s a problem. Success criteria and what you learned are where the real value lives. They’re what separates a testing program from a series of random guesses.
87% of mature experimentation programs standardize their test documentation. If you’re not doing this yet, you’re in the majority. But you’re not in the majority that grows.
Our take: The template format doesn’t matter. Google Sheets, Notion, a text file, whatever. What matters is that you actually fill it in before you launch the test. A half-finished template in Google Sheets beats a beautiful Notion database that nobody updates.
If you need a refresher on what makes a good test, check out our guide to A/B testing best practices. But for now, let’s look at each template.
A/B test plan template
This is the most important template. It’s the one you fill out before you touch any testing tool. Think of it like a recipe: you wouldn’t start cooking without knowing the ingredients.
Here’s what a complete A/B testing plan template looks like, filled in with a real example:
| Field | Example |
|---|---|
| Test name | Homepage CTA copy test |
| Objective | Increase free trial signups |
| Page | Homepage (kirro.io) |
| Hypothesis | If we change the CTA from “Get Started” to “Start My Free Trial,” then trial signups will increase by at least 10%, because first-person possessive language makes the action feel personal |
| Primary metric | Free trial signup rate |
| Secondary metrics | Click-through rate on CTA, bounce rate |
| Audience | All desktop visitors, excluding logged-in accounts |
| Traffic split | 50% Version A (original) / 50% Version B |
| Estimated duration | 3 weeks (based on sample size calculation) |
| Success criteria | Version B beats Version A by at least 10% with 95% confidence |
| Launch date | 2026-06-15 |
Notice the success criteria line. That’s the field most people skip, and it’s the one that matters most.
Lock in your criteria before you see the results
Scientists call this “pre-registration.” Fancy word, simple idea: write down what counts as a win before you start.
Why? Because once you see the data, your brain will find a way to declare victory. Version B didn’t beat Version A on signups? Well, it did improve time-on-page. That counts, right?
No. It doesn’t. Not unless you said it would beforehand.
Research from Stanford and IZA found that a hypothesis field alone doesn’t prevent cherry-picking results. You need a complete plan: the exact metric, the minimum improvement you’re looking for (your minimum detectable effect), and the confidence level. Write it down. Lock it in.
This is easier than it sounds. Just answer three questions before you launch:
- Which metric decides the winner?
- How big does the improvement need to be?
- How confident do you need to be in the result?
Write those answers in your plan template. Done. You’ve just pre-registered your test.
Tools like Kirro handle traffic allocation, test duration, and confidence calculations automatically, so you can skip the spreadsheet entirely. But if you’re starting with templates, that’s fine too.
When you’re ready to put this plan into practice, our guide on designing a marketing experiment walks through the full process.
A/B test hypothesis template
The hypothesis is where most tests go wrong. People write incomplete ones. “I think a green button will convert better” is a guess, not a hypothesis. A hypothesis has a structure.
Use the If/Then/Because format:
If [we make this specific change], then [this specific metric will improve by this much], because [this is the rationale based on evidence or research].
The “because” is the part people skip. It’s also the part that makes the difference. Without it, you’re just running random tests. With it, even a losing test teaches you something, because you can evaluate whether your reasoning was wrong or your execution was off.
Four examples across different test types:
Headline test: If we change the homepage headline from “Welcome to our platform” to “Stop guessing. Start testing,” then bounce rate will decrease by 15%, because action-oriented language creates urgency and relevance.
CTA test: If we change the pricing page CTA from “Sign Up” to “Start Testing Free,” then click-through rate will increase by 20%, because benefit-focused CTAs outperform generic ones.
Layout test: If we move testimonials above the fold on the landing page, then form submissions will increase by 10%, because social proof near the primary action reduces hesitation.
Pricing page test: If we add a “most popular” badge to the mid-tier plan, then mid-tier plan selections will increase by 25%, because anchoring a recommendation reduces decision fatigue on pricing pages.
Common hypothesis mistakes
Three mistakes kill more hypotheses than bad ideas:
- No “because.” “If we change X, then Y will improve” is a prediction, not a hypothesis. The reasoning is what makes it testable and teachable.
- Unmeasurable outcomes. “Users will like it more” isn’t measurable. Tie every hypothesis to a number you can actually track. Need help picking the right one? Here’s our guide to A/B testing metrics.
- Too many variables. “If we change the headline, the image, and the CTA…” that’s not a test. That’s a redesign. If you want to change multiple things at once, you need multivariate testing.
Want to build hypotheses faster? Our hypothesis generator tool walks you through the If/Then/Because format step by step.
A/B testing Excel template
Google Sheets and Excel are where most testing programs start. And honestly? They work fine early on. You don’t need a fancy tool to track three tests.
A basic A/B testing Excel tracker needs two tabs: one for active tests, one for your test backlog.
Tab 1: Test tracker
| Test ID | Page | Hypothesis | Start date | End date | Status | Primary metric | Baseline | Result | Winner | Learning |
|---|---|---|---|---|---|---|---|---|---|---|
| T-001 | Homepage | If we change… | Jun 15 | Jul 6 | Complete | Signup rate | 3.2% | 4.1% | Version B | Personal CTAs outperform generic |
| T-002 | Pricing | If we add… | Jun 20 | … | Running | Plan selection | 45% | … | … | … |
Tab 2: Test backlog with ICE scoring
Your backlog is where test ideas wait their turn. But which one should you run first? That’s where a prioritization framework helps.
ICE scoring is the simplest approach. Rate each test idea from 1 to 10 on three criteria:
- Impact: If this test wins, how much will it move the needle?
- Confidence: How sure are you it’ll actually win? (Be honest.)
- Ease: How easy is it to build and launch?
Average the three scores. Run the highest-scoring ideas first.
| Test idea | Impact (1-10) | Confidence (1-10) | Ease (1-10) | ICE score | Status |
|---|---|---|---|---|---|
| Change homepage headline | 8 | 7 | 9 | 8.0 | Next up |
| Redesign checkout flow | 9 | 5 | 3 | 5.7 | Backlog |
| Add trust badges | 6 | 6 | 8 | 6.7 | Backlog |
ICE isn’t perfect. It’s subjective, and the same idea can score differently depending on who’s scoring. But it’s fast, it’s better than gut feeling, and it gets you started.
Want something more rigorous? The PXL framework (created by Peep Laja at CXL) uses yes/no questions instead of subjective scores. “Is the change above the fold?” “Is it based on user testing data?” Binary answers leave less room for opinion. Better for mature programs, but overkill when you’re just getting started.
Our take: Use ICE until you’re running 10+ tests per month. Then look at PXL. Frameworks are like training wheels: they’re helpful early on, and you’ll know when you’ve outgrown them.
When A/B testing in Excel stops working
Spreadsheets break down in predictable ways:
- Version control. Someone edits the wrong row. Nobody knows which version is current. The “FINAL_v3_ACTUALLY_FINAL” filename makes an appearance.
- No collaboration controls. Three people with write access means three people who can accidentally overwrite each other’s notes.
- Reporting is manual. Want to see all tests you ran on the checkout page last quarter? Get ready to build a pivot table. Or three.
- Institutional knowledge walks out the door. When your testing lead leaves, their personal spreadsheet goes with them.
These aren’t hypothetical problems. PractiTest research found that traceability between tests and the research that motivated them becomes “nearly impossible” in spreadsheets as programs grow.
If you’re nodding along, you might be ready to move beyond spreadsheets. We’ll get to that in a minute.
Test results template (the part everyone skips)
Almost every template gets this part wrong. They have a column for “result” and maybe “winner.” That’s it. Test over, move on.
The numbers back this up. Only 36.3% of A/B tests produce a clear winner. Another 41.6% are inconclusive. If you only document the winners, you’re throwing away most of your data.
Stefan Thomke, a professor at Harvard Business School, puts it bluntly: “For every online experiment that succeeds, nearly 10 don’t.” At Booking.com, they run 25,000 tests a year with roughly a 10% success rate. Those aren’t failures. They’re data. But only if you write down what you learned.
This results template captures the full picture:
| Field | What to write |
|---|---|
| Test name | Same as your plan template |
| Dates run | Start and end date |
| Result | Winner, loser, or inconclusive |
| Primary metric: original | e.g., 3.2% signup rate |
| Primary metric: challenger | e.g., 4.1% signup rate |
| Improvement | +28% relative improvement |
| Confidence level | 96% (or whatever your tool reports) |
| What happened | One-sentence summary of the result |
| Why we think it happened | Your interpretation, the reasoning |
| What we learned | The actual insight, applicable beyond this single test |
| What we’ll test next | The follow-up test this result suggests |
The last three rows are where compound knowledge gets built. A test that says “Version B won by 12%” is a data point. A test that says “personalizing CTAs consistently outperforms generic language across three separate tests” is an insight. One of those changes how you build landing pages forever.
Ronny Kohavi led experimentation at Microsoft, Airbnb, and Amazon. He recommends maintaining a searchable experiment repository so past tests inform future ones. Your spreadsheet results tab is the start of that repository.
And about those inconclusive results? Document them. Record why the test didn’t reach confidence (your conversion rate baseline, not enough traffic, test was too short). This prevents your team from re-running the same test six months later because nobody remembers you already tried it.
When to replace your template with a tool
Templates are training wheels. They teach you the right process. But there’s a point where the spreadsheet creates more work than it saves.
Signs you’ve outgrown yours:
- You’re running 5+ tests per month and losing track of which ones are active
- Multiple people need to update the same tracker (and they’re overwriting each other)
- You’ve re-run a test you already ran because nobody could find the old results
- Stakeholder reporting means manually copying data into a slide deck every month
- Your testing lead left, and their institutional knowledge went with them
What does a dedicated A/B testing tool handle that spreadsheets can’t? Traffic splitting (sending the right visitors to each version). Real-time results. Automatic statistical calculations so you know whether to trust the numbers. And a visual editor so you can change a headline without writing code.
Only 13% of experimentation teams have well-integrated tool stacks. Most teams are duct-taping spreadsheets to their testing tool and manually copying results back and forth. If that sounds familiar, you’re not alone.
Kirro handles this by rolling the template into the tool. You set up a test, and the plan, hypothesis, traffic allocation, statistical analysis, and results all live in one place. No spreadsheet handoff. No copy-pasting. When a test finishes, everything is searchable. Six months later, you can actually find what you learned.
If you’re still in the early stages (1 to 3 tests per month), stick with templates. They’re free, they work, and they’ll teach you the process. When the spreadsheet starts slowing you down, try setting up your first test in Kirro. Three minutes, no code, and you’ll never manually calculate sample size again.
For a broader comparison of what’s available, check out our A/B testing tools roundup.
FAQ
What format should my A/B testing template be in?
Google Sheets for most teams. It’s free, it’s collaborative, and everyone already knows how to use it. Use Excel if you need offline access or your company lives in Microsoft’s ecosystem. Notion works if your team already runs workflows there. The format matters way less than actually using the template consistently. A Google Sheet that gets filled in beats a custom Notion database that nobody touches.
How detailed should an A/B test document be?
Eight fields minimum (the ones listed above). You can add more detail for complex tests, but don’t let documentation slow you down. A five-minute template beats a 30-minute report that never gets filled out. Craig Sullivan, who’s run thousands of tests, describes it well: at 50+ tests per month, elaborate documentation is impractical. The goal is functional, not polished.
Should I use a template or an A/B testing tool?
Start with a template if you’re running fewer than 5 tests per month. A template teaches you the process, and the process matters more than the tool at this stage. Move to a tool when collaboration, tracking, or statistical analysis becomes the bottleneck. If you’re calculating sample sizes by hand and copying results between tabs, a tool will save you real time. Set up a free test in Kirro and see if it clicks for you.
What’s the best test prioritization framework?
ICE is the best starting point. It’s fast (score three criteria from 1 to 10), it’s widely understood, and it’s better than going with your gut. PXL is more rigorous because it uses true/false questions instead of subjective scores, which means less arguing in meetings. Start with ICE, switch to PXL when your team is running 10+ tests a month and wants less subjectivity in prioritization.
What should I do with inconclusive A/B test results?
Document them. 41.6% of tests are inconclusive, according to DRIP Agency’s analysis of 91 e-commerce brands. That’s normal, not a failure. Record what you tested, why it didn’t reach confidence, and whether the direction looked positive or negative. This prevents re-running the same test later and builds your team’s knowledge over time. Even inconclusive results teach you something: maybe the change was too small, the traffic was too low, or your original was already decent. That’s useful to know. Check our guide on common A/B testing mistakes for more on interpreting tricky results.
Can I test multiple things at once with these templates?
These templates are built for standard A/B tests where you change one thing at a time. Want to test a new headline plus a new image plus a new CTA all at once? That’s multivariate testing, and it needs more traffic and a different setup. For most teams, one change at a time teaches you more. Save the multivariate approach for high-traffic pages where you’ve already picked the obvious winners.
Randy Wattilete
CRO expert and founder with nearly a decade running conversion experiments for companies from early-stage startups to global brands. Built programs for Nestlé, felyx, and Storytel. Founder of Kirro (A/B testing).
View all author posts