A hypothesis generator turns a vague idea (“maybe we should change the headline”) into a structured prediction you can actually test. Instead of guessing, you follow a simple framework: what you’ll change, what you expect to happen, and why.
That structure matters more than most people think. Optimizely analyzed 127,000 A/B tests and found that only 12% won on their primary metric. At Google and Bing, only 10 to 20% of tests produce positive results. Netflix says 90% of their ideas fail.
Those numbers aren’t a reason to skip testing. They’re a reason to stop guessing and start with a real hypothesis.
What is a hypothesis generator (and why it matters for A/B testing)
Most hypothesis makers online are simple text formatters. You type in a variable, click a button, and get a sentence that sounds scientific. That’s not what makes tests win.
What actually makes tests win is the thinking behind the hypothesis. Booking.com, which runs over 1,000 tests at any given moment, put it this way: “Without clearly defining the hypothesis in advance, there really can be no good experimentation.”
Here’s the difference between a guess and a hypothesis:
- Guess: “Let’s make the button bigger.”
- Hypothesis: “If we increase the contrast on our CTA button, then click-through rate will increase by 5%, because heatmap data shows 60% of visitors never notice the current button.”
The hypothesis has three things the guess doesn’t. A specific change. A measurable outcome. And evidence. A good hypothesis checker asks: does it have all three?
One quick story. A Bing engineer proposed a small headline change. Leadership shelved it. When they finally tested it, that single change generated over $100 million per year in additional revenue. Structured testing surfaces wins that gut instinct buries.
If you want to see how hypotheses connect to actual A/B testing and conversion rates, that post covers the full picture.
How to generate a hypothesis for your A/B test
Whether you call it a hypothesis creator or just “writing down what you think will happen,” there are two main templates. Pick whichever feels more natural.
The simple version:
If we [make this specific change], then [this metric] will [improve/decrease], because [evidence-based reason].
The practitioner version (from Craig Sullivan’s Hypothesis Kit, iterated over 8 years with input from Netflix, Booking.com, and Skyscanner):
Because we saw [data or insight], we expect that [this change] for [this audience] will cause [this outcome]. We’ll measure this by [metric].
Notice that the practitioner version starts with “because.” That’s deliberate. It forces you to ground your test in data before you propose a change. No data? No hypothesis yet.
Three real examples
Homepage hero test: If we replace the generic stock photo with a product screenshot, then signups will increase, because session recordings show visitors scroll past the hero without engaging.
Pricing page test: Because exit surveys show 40% of visitors find our pricing confusing, we expect that adding a “most popular” badge to the middle tier will increase plan selections. We’ll measure this by checkout starts.
Checkout test: If we add trust badges above the payment form, then cart abandonment will decrease, because our support tickets show “is this site safe?” is the #3 question from new customers.
Each hypothesis includes the key A/B testing metrics you’ll measure. Without a metric, you can’t tell if you won.
Our take: The fancy template doesn’t matter as much as the “because.” If you can fill in the “because” with real data, you’ve got a strong hypothesis. If you can’t, go collect data first.
Where good hypotheses actually come from
Every competitor page for “hypothesis generator” skips this question entirely. They assume you already know what to test and just need a hypothesis helper to format a sentence. But finding the right thing to test is 90% of the work.
The CXL ResearchXL framework breaks this into four categories of data. You don’t need all four for every hypothesis, but you need at least two.
1. Analytics (what’s happening)
Your analytics tool tells you where visitors drop off. Open GA4, look at your funnel, and find the biggest leak. If 70% of visitors leave your pricing page without scrolling, that’s a hypothesis waiting to happen.
2. Behavioral data (why it’s happening)
Heatmaps and session recordings show you the “why” behind the numbers. Someone rage-clicking a thing that doesn’t click? That’s a clue. Visitors not scrolling past the first screen? Your call-to-action might be invisible.
3. User feedback (what people say)
Surveys, reviews, and exit polls capture what visitors think in their own words. This is gold for copy-focused hypotheses. If three people say “I couldn’t figure out your pricing,” that’s your next test.
4. Customer conversations (what people mean)
Support tickets, sales calls, and interviews reveal things no analytics tool can capture. The anxieties, the objections, the moments of confusion. These are your highest-confidence hypothesis inputs.
Mouseflow’s research puts it well: “Getting at least two correlating data sources per hypothesis makes them much more likely to be true.”
There’s also the LIFT model (created by Chris Goward in 2009). It looks at any page through six lenses: value proposition, clarity, relevance, urgency, anxiety, and distraction. Walk through all six on a single page and you’ll generate 10 to 18 testable hypotheses. That’s a month of tests from one review.
This is exactly what Kirro does automatically. It analyzes your pages against conversion rate optimization (CRO) frameworks like these and suggests what to test, with clear reasoning for each suggestion. Instead of staring at analytics trying to find problems, you get a prioritized list of opportunities.
If you want the full process for turning these ideas into a complete test, check out how to design your marketing experiment.
Can AI write your A/B test hypotheses?
Short answer: yes, but with limits.
A JMIR 2025 study tested AI hypothesis generation in a controlled setting. Of 96 AI-generated hypotheses, 14% were rated highly novel and 65% moderately novel. Average innovation score: 3.85 out of 5. Not bad.
But a Science.org study told a more cautious story. AI-generated hypotheses scored well initially, but after real-world testing, their novelty scores dropped from 5.4 to 3.4. Human-generated hypotheses held up better, dropping from 4.6 to 4.0. AI ideas sounded creative but didn’t perform as well when actually tested.
The sweet spot? Use AI to brainstorm, then validate against your own data. Here’s a prompt you can steal:
“I run an e-commerce store selling [product]. My checkout page has a [X%] abandonment rate. Based on CRO best practices, suggest 5 hypothesis ideas for reducing cart abandonment. For each, include what to change, the expected outcome, and why it should work.”
Then compare those ideas against your analytics, heatmaps, and customer feedback. The ones that match multiple data sources are worth testing. The rest are educated guesses.
AI-powered A/B testing tools like Kirro go further. Instead of a generic AI hypothesis generator, Kirro analyzes your specific pages and suggests tests based on what it finds. The hypotheses come with reasoning tied to your site, not generic best practices.
Our take: AI hypothesis tools are like GPS. Useful for direction, terrible for deciding where you actually want to go. Let AI suggest routes, but pick the destination yourself.
Null and alternative hypotheses in A/B testing
If you’ve searched for a “null hypothesis generator,” here’s the plain-language version.
Every A/B test has two competing statements:
- Null hypothesis (H0): Your change makes no difference. The control group (your current page) and the new version perform the same.
- Alternative hypothesis (H1): Your change does make a difference. One version performs better.
Your A/B testing tool handles this math for you. It collects data, compares the two versions, and asks one question: “Is this difference real, or just random noise?”
When the tool says “we’re 95% confident Version B wins,” it means there’s only a 5% chance the difference is random. In other words, the evidence was strong enough to reject the null hypothesis.
Quick example:
| Null hypothesis | Alternative hypothesis | |
|---|---|---|
| Headline test | Changing the headline won’t affect signups | The new headline will increase signups |
| CTA color | Button color doesn’t impact clicks | The green button gets more clicks than grey |
| Pricing layout | New layout won’t change plan selections | Side-by-side layout increases premium plan picks |
You don’t need to write formal null and alternative hypotheses for every test. Your testing tool does that automatically. But understanding the concept helps you make better decisions about when results are real vs. when you need more data.
For tests involving more than two versions, multivariate testing explains when each approach makes sense.
5 hypothesis mistakes that kill your A/B tests
1. No hypothesis at all
“Let’s just test it and see what happens” sounds reasonable. It isn’t. Without a hypothesis, you don’t know what metric to watch, when to stop, or what the result means. Convert.com analyzed 28,304 tests and found only 20% reached the confidence threshold needed for a valid result. Many of those tests never had a clear hypothesis to begin with.
2. Missing the “because”
“If we shorten the form, then completions will increase.” That’s half a hypothesis. Why will a shorter form increase completions? Which fields are causing drop-offs? Without the mechanism, you can’t learn from the result whether it wins or loses.
3. Writing the hypothesis after seeing the data
This is called “drawing the target around the arrow.” You peek at results, notice Version B is winning, then write a hypothesis that explains why. It feels productive. It’s actually meaningless, because you’re just confirming what you already saw.
4. Testing too many things at once
Changed the headline, the image, the button color, and the form layout in one test? Even if it wins, you won’t know which change caused it. One variable per test. If you want to test multiple changes, read about common A/B testing mistakes and A/B testing best practices first.
5. Trusting your gut over your data
This one’s backed by hard numbers. GoodUI analyzed 70,149 predictions of A/B test outcomes. People guessed correctly only 59% of the time. Barely better than flipping a coin. But when predictions were based on documented patterns from previous tests, accuracy jumped to 71%.
Evidence beats intuition. Every time.
One more thing worth knowing: even with a perfect hypothesis, peeking at results early can make 80% of “winning” results false positives. If you want to check results without ruining them, sequential testing explains how.
Hypothesis prioritization: which test to run first
Having a backlog of hypotheses is great. Testing them in random order isn’t. Three frameworks can help, from simplest to most rigorous.
ICE (created by Sean Ellis of GrowthHackers): Rate each hypothesis on Impact, Confidence, and Ease from 1 to 10. Average the scores. Test the highest first. Fast, but “Confidence” is subjective. Optimists score everything high.
PIE (from Chris Goward’s You Should Test That): Same idea, different labels. Potential, Importance, and Ease. Same subjectivity problem.
PXL (from Peep Laja at CXL): Replaces squishy 1-to-10 scores with 10 binary yes/no questions. Is the change above the fold? Backed by analytics? By heatmaps? By user feedback?
Binary questions mean two people scoring the same hypothesis will land on similar scores. That’s why we’d recommend PXL for most teams.
For small teams, try a simplified version. Score each hypothesis on three yes/no questions:
- Is it backed by data from at least two sources?
- Is the change on a high-traffic page?
- Can you launch it this week?
Three yeses? Test it. Everything else goes back in the backlog.
The data supports this kind of discipline. DRIP Agency found that teams using structured hypothesis pre-qualification achieved a 36.3% win rate across 90+ e-commerce brands. Teams without structure? The industry average sits around 12 to 25%.
If you’re ready to start building your testing program, grab an A/B testing template to keep everything organized. And if you want Kirro to suggest your first test, it takes about three minutes.
FAQ
What is a hypothesis generator?
A hypothesis generator is a tool or framework that helps you create structured, testable predictions for A/B tests. Instead of guessing what to change on your website, it walks you through the evidence, the change, and the expected outcome. The best hypothesis generators go beyond formatting. They help you find what to test based on your actual data.
How do you write a hypothesis for an A/B test?
Use the if/then/because format. “If we [make this specific change], then [this metric] will [improve], because [evidence or reasoning].” The “because” is the most important part. It forces you to ground your test in data, not opinion. If you can’t fill in the “because” with something you’ve actually observed, go collect more data before testing.
Can AI generate a hypothesis?
Yes. AI tools can brainstorm hypothesis ideas quickly. Research shows AI-generated hypotheses score well on novelty (14% rated highly novel, 65% moderately novel in a controlled study). But AI works best as a brainstorming partner. You still need to validate ideas against your own analytics, heatmaps, and customer feedback before committing to a test.
What is the difference between a null and alternative hypothesis?
The null hypothesis says your change won’t make a difference. The alternative hypothesis says it will. Every A/B test tries to reject the null hypothesis, meaning it tries to prove that your change actually had a measurable effect. Your testing tool handles this math automatically. You just need to define what you’re testing and what metric you’re watching.
How do you prioritize which hypothesis to test first?
Use a scoring framework. The simplest is ICE (Impact, Confidence, Ease). The most rigorous is PXL, which uses yes/no questions instead of subjective scores. For small teams, ask three questions: Is this backed by data? Is it on a high-traffic page? Can we launch it this week? If all three answers are yes, run the test. Structured prioritization can double or triple your win rate compared to testing random ideas.
Randy Wattilete
CRO expert and founder with nearly a decade running conversion experiments for companies from early-stage startups to global brands. Built programs for Nestlé, felyx, and Storytel. Founder of Kirro (A/B testing).
View all author posts