How many hypotheses should we have in our backlog at any time?

Aim for 15-25 well-formed hypotheses at any given time. Fewer than 10 means you are not learning fast enough from existing data. More than 30 suggests you are collecting ideas without prioritizing ruthlessly.

What if we do not have a psychological principle for the BECAUSE clause?

Use quantitative observations instead. 'Because heatmap data shows 68% of users never scroll past the third fold' is a valid mechanism. The key is that the BECAUSE clause must be falsifiable and based on evidence, not opinion.

Should the predicted metric in the THEN clause include a specific number?

Yes. Even if the number is an estimate, forcing a specific prediction improves calibration over time. It also lets you evaluate whether the test was worth running: a 0.1% lift that required two weeks of engineering was a poor allocation of resources.

How do we handle hypotheses that win but for the wrong reason?

This is one of the most valuable outcomes. If the BECAUSE clause is invalidated but the result is positive, you have discovered a new mechanism. Document it, update your model, and generate new hypotheses based on the actual mechanism.

Can one test have multiple hypotheses?

Only if you have enough traffic to detect interaction effects. In practice, most e-commerce brands should test one hypothesis per variant to maintain clean attribution. If you need to test multiple changes, use a multi-arm test with isolated variants.

How to Write a CRO Hypothesis That Actually Gets Tested

What makes a hypothesis different from a test idea?

A test idea describes what you want to change. A hypothesis explains why you believe that change will produce a specific, measurable result -- and what psychological or behavioral mechanism drives the prediction.

Every CRO team accumulates test ideas. They sit in spreadsheets, Slack threads, and post-it notes from the last workshop. "Let's test a bigger CTA button." "What if we add urgency messaging?" "Try moving the reviews higher." These are not hypotheses. They are guesses wearing a lab coat.

The distinction matters because ideas without structure produce tests without learnings. When a bigger button wins, what did you learn? That bigger things get clicked? That does not transfer to the next test. When a bigger button loses, you learn even less -- because you never articulated why it should have won in the first place.

DRIP Insight

A well-formed hypothesis is falsifiable, measurable, and rooted in a reason. It forces you to think before you build, and it produces knowledge regardless of the test outcome.

The framework we use at DRIP -- and the one that survives scrutiny at the boardroom level -- is IF/THEN/BECAUSE. Three clauses. Each one does specific work.

IF describes the specific, observable change you will make to the experience.
THEN predicts the measurable outcome (conversion rate, revenue per visitor, add-to-cart rate) and the direction of change.
BECAUSE anchors the prediction in a behavioral principle, user research finding, or quantitative observation that explains the mechanism.

Without the BECAUSE clause, you are not hypothesizing. You are hoping. Hope is not a strategy that impresses a CFO or builds institutional knowledge.

What does a bad hypothesis look like vs. a good one?

Bad hypotheses lack specificity, a measurable prediction, or a behavioral rationale. Good hypotheses connect a precise UI change to a predicted metric shift via an explicit psychological or data-driven mechanism.

The easiest way to calibrate your hypothesis quality is to see bad and good side by side. Below are five real examples from our testing program, each starting with the version that typically lands in a backlog -- and then the version that actually gets prioritized and produces transferable insight.

Bad vs. good: the anatomy

Bad hypotheses vs. structured hypotheses

Bad hypothesis	Problem	Good hypothesis (IF/THEN/BECAUSE)
"Let's test adding a trust badge on Giesswein's PDP."	No predicted outcome, no reason.	IF we add an award badge above the fold on Giesswein's product page, THEN add-to-cart rate will increase by at least 2%, BECAUSE third-party validation reduces perceived purchase risk for premium-priced functional footwear.
"Remove the newsletter bar on SNOCKS."	No metric, no behavioral rationale.	IF we remove the sticky newsletter overlay on SNOCKS mobile, THEN revenue per session will increase, BECAUSE the overlay interrupts the purchase flow and increases cognitive load at a moment when the user is evaluating products.
"Change the hero on Oceansapart."	"Change" is not a testable action.	IF we replace the lifestyle hero image with a product-focused hero on Oceansapart's homepage, THEN click-through to collection pages will increase, BECAUSE users arriving from performance ads expect product continuity, not brand storytelling.
"Add video on Blackroll."	Where? What video? Why?	IF we place a short product demonstration video above the fold on Blackroll's fascia roller PDP, THEN conversion rate will increase, BECAUSE tactile products require visual demonstration to bridge the online experience gap.
"Skip the cart page on Import Parfumerie."	Closer, but still no mechanism.	IF we send add-to-cart clicks directly to checkout on Import Parfumerie, THEN checkout completion rate will increase, BECAUSE an intermediate cart page adds a decision point that invites second-guessing on low-consideration repurchase items.

Common Mistake

Notice that every bad hypothesis reads like a to-do list item. Every good one reads like a prediction you can be wrong about. That is the point. Being wrong teaches you something. Completing a to-do does not.

How do you decompose a hypothesis into its testable parts?

Decomposition means isolating the IF (change), THEN (metric + direction), and BECAUSE (mechanism) so each can be independently evaluated, challenged, and learned from when results arrive.

Let us walk through five real DRIP tests, fully decomposed. Each one shows how the framework forces rigor at every stage -- from design to analysis.

Example 1: Giesswein award badge

Giesswein

IFwe display a "Best Tested" award badge from a recognized consumer magazine directly below the product title on Giesswein's product detail page

THENadd-to-cart rate increases by at least 2 percentage points

BECAUSEthird-party validation (social proof from an authority figure) reduces perceived risk for a premium product priced above market average, lowering the psychological threshold to commit

ResultConversion rate increased 3.1%. The authority-based social proof mechanism was validated specifically for premium price segments.

What we learned: authority-based social proof outperforms peer-based social proof (star ratings) when the product carries a premium positioning. This insight transferred directly to other premium brands in our portfolio.

SNOCKS

IFwe remove the sticky newsletter signup overlay on SNOCKS mobile product listing pages

THENrevenue per session will increase despite losing newsletter signups

BECAUSEthe overlay interrupts the browsing-to-purchase flow at the critical product evaluation stage, and the cognitive load of dismissing an overlay outweighs the lifetime value of a forced email signup

ResultRevenue per session increased 3.8%. Newsletter signups dropped, but the revenue gain from uninterrupted purchase flow far exceeded the projected LTV of interrupted signups.

Counterintuitive Finding

This result is counterintuitive because newsletter popups are considered a best practice. The BECAUSE clause predicted exactly why this best practice would fail in this context: the overlay interrupts a high-intent browsing state. Without that clause, the result would have been surprising. With it, the result was expected.

Oceansapart

IFwe replace the lifestyle hero image with a product grid hero on Oceansapart's homepage

THENclick-through rate to collection pages increases by at least 5%

BECAUSEusers arriving from paid social ads have already been exposed to lifestyle branding and now need product-level navigation to continue their purchase journey -- the hero should match their evolved intent stage

ResultClick-through to collection pages increased 7.2%. Validated the intent-matching principle: content should match the user's current decision stage, not repeat what they have already seen.

Example 4: Blackroll above-fold video

Blackroll

IFwe add a 15-second product demonstration video above the fold on Blackroll's fascia roller product page

THENconversion rate will increase by at least 3%

BECAUSEfascia rollers are tactile products where the use case is not self-evident from a static image -- video bridges the sensory gap between online browsing and physical product experience

ResultConversion rate increased 5.1%. The sensory gap principle proved especially powerful for products where the usage method is non-obvious.

Example 5: Import Parfumerie skip cart

Import Parfumerie

IFwe bypass the cart page entirely and send add-to-cart actions directly to checkout on Import Parfumerie's product pages

THENcheckout completion rate increases by at least 10%

BECAUSEperfume repurchases are low-consideration decisions where an intermediate cart page introduces an unnecessary decision point -- the cart becomes a moment of doubt rather than a moment of confirmation

ResultConversion rate increased 18.62%, generating an estimated additional EUR 118K per month. The unnecessary decision point mechanism was validated far beyond the predicted magnitude.

+18.62%CR liftImport Parfumerie skip-cart test

+EUR 118K/moIncremental revenueFrom removing one page

Which psychological principles should anchor the BECAUSE clause?

The strongest BECAUSE clauses draw from cognitive load theory, loss aversion, social proof, the default effect, decision fatigue, and intent matching. The principle must be specific enough to be falsifiable.

A vague BECAUSE is almost as useless as no BECAUSE at all. "Because it looks better" is an aesthetic opinion. "Because reducing the number of form fields from nine to five reduces cognitive load, which decreases abandonment for users in a completion mindset" is a mechanism you can evaluate, test, and learn from.

The following principles appear most frequently in DRIP hypotheses that produce actionable learnings, regardless of whether the test wins or loses.

Common psychological principles for CRO hypotheses

Principle	When to use it	Example application
Cognitive load theory	When you are removing elements, simplifying flows, or reducing choices.	Removing the birthday field from KoRo's checkout reduced cognitive load on a step where completion momentum matters most.
Loss aversion	When framing savings, scarcity, or what the user stands to miss.	Showing 'only 3 left' triggers loss aversion more strongly than showing '97 sold' triggers social proof.
Authority social proof	When displaying certifications, awards, press mentions, or expert endorsements.	Giesswein's award badge leveraged authority proof specifically because the product's premium price demands trust from a recognized source.
Decision fatigue / Hick's Law	When reducing options, streamlining navigation, or simplifying choices.	Import Parfumerie's skip-cart test eliminated a decision point that drained cognitive resources without adding value.
Sensory gap bridging	When the product's value depends on experiencing it physically.	Blackroll's video addressed the fact that a foam roller's benefit is invisible in a static photograph.
Intent matching	When aligning page content with the user's current decision stage.	Oceansapart's hero swap matched the homepage content to users who had already been primed by ad creative.

Pro Tip

The BECAUSE clause does not need to be a single textbook principle. Combining two principles is fine as long as each one is individually falsifiable. What it must never be: "because we think it will work."

When a test loses, the BECAUSE clause is where you find the learning. If the mechanism was wrong, you update your model of how users behave. If the mechanism was right but the implementation was flawed, you iterate the variant. Without a stated mechanism, a losing test produces nothing but wasted traffic.

How do you prioritize hypotheses once you have a backlog?

Prioritize by expected revenue impact multiplied by confidence in the mechanism, then divide by implementation effort. Hypotheses with a validated BECAUSE clause from previous tests get higher confidence scores.

A well-formed hypothesis backlog is a strategic asset. But a long list of well-formed hypotheses is still just a list unless you have a framework for deciding what to test first. Time is finite, traffic is finite, and every test that runs prevents another from running.

At DRIP, we use a modified ICE framework that specifically weights the BECAUSE clause. Here is how it works in practice.

Impact: What is the projected revenue delta if the hypothesis is correct? Use historical conversion data and current traffic to estimate. A 2% CR lift on a page with 100K monthly sessions and a EUR 60 AOV is worth roughly EUR 120K/year.
Confidence: How strong is the evidence supporting the BECAUSE clause? A mechanism validated by a previous winning test scores higher than a mechanism derived from industry benchmarks. A mechanism based on primary user research (heatmaps, session recordings, surveys) scores higher than intuition.
Ease: How many engineering hours does the variant require? A CSS-only change takes a day. A checkout flow restructure takes two weeks. Factor in QA, edge cases, and device-specific behavior.

The critical difference from standard ICE: the Confidence score is directly tied to the hypothesis quality. A hypothesis with a vague BECAUSE clause scores low on Confidence regardless of how exciting the idea sounds. This structural incentive pushes the entire team toward better hypothesis writing.

DRIP Insight

The best testing programs are not the ones with the most tests. They are the ones where each test builds on the learnings of the previous one. Good hypotheses make that compounding possible.

After each test concludes, update the backlog. Winning BECAUSE clauses increase the confidence score for other hypotheses that share the same mechanism. Losing BECAUSE clauses force a reevaluation. This feedback loop is what separates a random testing program from a learning machine.

Want help building a hypothesis backlog for your brand? Talk to our CRO strategists. →

How to Write a CRO Hypothesis That Actually Gets Tested

What makes a hypothesis different from a test idea?