10 reasons why A/B testing might be the wrong move
A/B testing has become the gold standard of data-driven decision-making, especially as companies shift toward product-led roadmaps. Every button color, headline, and feature rollout now seems to demand its own scientific experiment. But here’s the kicker—not every decision needs (or deserves) an A/B test. In fact, blindly running tests can slow you down, burn resources, and sometimes even lead you astray.
Before you eagerly hit “Launch” on yet another experiment, consider these 10 reasons why an A/B test might not be the best move. Your future self (and your analytics team) might just thank you.
1. Low traffic
A/B tests rely on having sufficient traffic to reach statistical significance. If your traffic is too low, your test could take a long time to yield meaningful results. Before jumping into an experiment, use a sample size calculator to estimate how many users you’ll need. Teams often get excited about testing an idea, only to realize too late that their experiment will take far longer than expected to produce actionable outcomes. For example – a startup which has say ~500 monthly users should not run A/B tests. If their conversion rate is 2% and they have 2 variants, detecting a 5% lift would take 1254 days.
2. “We have no idea what to do — Let’s run a multivariate test”
Multivariate tests are one of the most expensive ways to experiment. Running A/B/C tests means you’re investing time and effort across design, engineering, and data teams to build and analyze multiple outcomes. Before launching a multivariate test, ask yourself: Do we really need to test everything at once, or can we sequence these tests over time?
Often, a more structured, step-by-step approach yields better insights with fewer resources. If your website gets 5000 visitors per day and you increase lift from 5% to 10%, testing 2 variants will take 8 days to get to statistical significance while testing 3 variants would take 11 days (33% slower).
| Variant Count | Baseline conversion rate | Minimal Detectable Lift | Sample size per variant | Total Sample Size | Daily Users per Variant | Days to Stat Sig |
| 2 variants | 5% | +10% | ~17,600 | ~35,200 | 2,500 | ~8 days |
| 3 variants | 5% | +10% | ~17,600 | ~52,800 | 1,667 | ~11 days |
3. Lack of a clear hypothesis
Before running an A/B test, ask yourself: What will we learn if the test succeeds? What will we learn if it fails? A common mistake teams make is running experiments without a clear takeaway. The goal of A/B testing should be to generate insights that compound over time. A well-structured test should explicitly state: If the experiment succeeds, here’s what we learn. If it fails, here’s what we learn instead.
If you run an A/B test without a clear hypothesis, what you’ll learn might be tricky to derive. A/B test results often required a lot of work to generate insights and without clear hypotheses, you could waste a lot of time and resources without any clear insights. For example at Posthog, ran an experiment to test social login buttons instead of signup with email. While more people signed up using Google and Github, overall signups didn’t increase. This experiment didn’t give a clear learning of if social logins are better or worse.
4. Accumulating tech debt
Every A/B test has the potential to increase long-term tech debt. If not managed properly, this complexity can bloat your test suites, making future development slower and more error-prone. Before launching an experiment, consider whether the insights gained will justify the additional maintenance burden. When developers are writing code for A/B tests, they are writing code for two or more scenarios. The teams often move from an A/B test to another without leaving much space to clean up code after the test has resolved. This creates a lot of branches in your code and automated tests which get worse over time.
The question you should ask is is this test worth complicating code and potentially accumulating tech debt? In one of the companies I worked in, we ran an A/B test to improve onboarding conversion. This was a pretty big overwrite of our onboarding funnel. It’s been 3 years and we are yet to clean up that test because of how much effort it will be to clean up. Sometimes if cleanup is not done just in time, it becomes really difficult to untangle if the code is in some critical path. Here is a good video that goes into details about the engineering challenges of A/B testing.
5. The change is too small to test
Sometimes, teams want to test minor changes—like tweaking a single word in a headline—to see if it makes a difference. While small content changes can drive big results, it’s a slippery slope. If teams get into the habit of testing every tiny adjustment, they risk slowing down development and relying too much on testing instead of building intuition. A/B testing should be used to validate meaningful decisions, not to replace sound judgment.
6. The stakes are too high
Certain changes are simply too risky to A/B test. Some experiments—like pricing changes or drastic homepage redesigns—can erode user trust if handled poorly. For example, a paid product testing a freemium model might trigger widespread backlash from paying customers, causing unexpected churn. In these cases, qualitative research or phased rollouts may be a safer alternative. Some doors are one way doors and you really want to be thoughtful about which experiments can be one way doors and be thoughtful about their rollout strategy.
7. Too many variables at once
Changing too many elements at the same time makes it difficult to isolate what’s driving the results. If an A/B test includes multiple modifications—such as a complete page redesign—there’s no way to attribute outcomes to specific changes. If the test fails, was it the new layout? The updated messaging? The different call-to-action? To get reliable insights, align each test with a single hypothesis and limit changes accordingly. A good real world example of this would be the recent Airbnb redesign where they changed their design language, their homepage, their messaging and their navigation. If Airbnb bookings go down due to this change, it will be very difficult to isolate if it is due to their messaging or their navigation.
8. Competing experiments
If you run too many A/B tests at the same time, you need to be careful to reduce the impact of one experiment on the other. Logical way to do this is to not run multiple A/B tests on the same surface for similar metrics at the same time. If you’re trying to do that, you need to keep their audiences separate as much as possible to reduce interference. If not done carefully, it can be difficult to attribute which test resulted in changes to a metric.
This could cause a lot of toil for data scientists and might require further testing to validate if the result was caused by a specific test or combination of different experiments. For example – Doordash uses a sophisticated traffic allocation and experimentation scheduling to avoid interference. As your company grows, the need for a sophisticated experimentation strategy is critical to get isolated learnings from experiments.
9. Decision by committee
Sometimes, teams resort to A/B testing simply because they can’t reach a decision. This turns testing into a crutch rather than a tool for meaningful learning. A/B tests shouldn’t be used as a substitute for clear decision-making. If a team is split in a direction, the root issue might be misalignment—something that should be resolved through discussion, not experimentation. In these cases, the team should be thinking hard about their hypothesis and what their leading insights are to figure out what to test. If there are too many directions possible, focus is the key to learning.
10. Ignoring seasonality
Many products experience significant seasonal fluctuations, yet teams often fail to account for them in their experiments. For instance, Facebook sees a surge in traffic during the holiday season, which could distort A/B test results. A winning variation in December may not hold up in February. When running tests, consider external factors like seasonality, holidays, or major industry events that could skew your data. In a B2B context, companies often have specific times of the year when they allocate budgets. Conducting A/B testing during those times could result in biased outcomes.
Read more great content on Mind the Product
Scaling product innovation through invisible infrastructure
Top product management resources for Summer 2025
How prompt engineering is teaching us to communicate like product leaders
About the author
Mohit Agrawal
Mohit Agrawal is a product-focused engineering leader with a sharp eye for product management, design thinking, and engineering execution. As a Senior Engineering Manager at Wealthfront, he has built and scaled the Growth Engineering team, driving user acquisition, activation, and retention through experimentation and cross-functional collaboration. Previously at AppDynamics (a Cisco company) and Epic Systems, he led mobile product development, leveraging his expertise in bridging engineering, design, and data to create impactful user experiences. Passionate about building high-performing teams, Mohit blends strategic vision with hands-on execution to drive innovation and business growth.