How often do we hear people say: “Version A won, so our users like this one more!”
But their assumption is misleading. It illustrates perfectly why quantitative and qualitative testing (and research) need to go hand-in-hand for a complete picture. Although A/B testing tells you what users are doing, it doesn’t tell you why they are doing it. More importantly, the A/B test result ignores what happens in a portion of your traffic, in segments.
For an A/B test to be successful, it must analyze the results of a certain audience segment, not the audience as a whole. User interaction at the tested point is critical to the success of the test.
So, the key to learning in A/B testing is segmentation. Even though variation B might lose to A in the overall results, B might beat A in certain segments (organic, Facebook, mobile, etc). For segments, the same stopping rules apply.
Here are some common segments you could potentially look at:
Avinash Kaushik offers an excellent guide on how to segment your users between source, behavior, and outcome.
Segmentation lets you look at things like whether a win on mobile cancels out a loss on a desktop (or vice versa), or whether a campaign or email meant the sample was unfairly biased in the test period. Analyze your test across key segments to see this.
- Is this test performing differently for new/returning visitors?
- Does a variation work particularly well for a specific traffic source?
- Is a variation performing particularly poorly in a certain browser/OS? Could there be a bug?
- Within each variant, is there a pattern you can see between users who convert and who do not convert?
We were once running experiments to optimize our pricing page for conversions, and segmenting helped unravel a very interesting insight. Users coming to the pricing page from the homepage or features page converted far better than those coming from blogs. With hindsight this made sense, as blog readers were potentially in the “aware” stage of their customer journey (as our content was focused around those topics). This insight led our marketing team to build content to target users who were in the purchase stage of their journey, something that we were not actively thinking about.
Following Local Metrics and Micro Conversions
Every feature will have local metrics (did users click a button, or watch a video, etc) and every product will have global metrics (time spent in app, number of days users return, etc).
Usually during a split test, you tend to follow the top line or global metrics. At the end of the test, you see which metrics moved on and which didn’t. Typically they move very little and they’re very hard to affect.
With segmentation, you start looking at a lot of the local metrics and sometimes discover patterns and mismatches there. For example, it may be that your hypothetical feature is a success if users click on a button, but what does it mean if button clicks are up but overall conversions are low?
Segmentation makes you look at the correlation between local and global metrics. Each page contributes a certain amount to the conversion rate, and the closer you are to the bottom of the funnel, the higher your correlation factor. So, while you may increase the conversions from your home page to your order confirmation page, the impact on the overall conversion rate is a lot less.
Tracking local metrics or “micro conversions” can either be the main goal for some tests, or offer another layer of insights to tests where macro conversion is the primary goal. When designing an experiment, product managers should allocate time to consider what additional goals they want to track. It might be click goals for key Calls to Action (CTAs) tracked within Optimizely, or events for key actions within Google Analytics, such as video plays or scroll-depth tracking.
All this tracking will improve the quality of your information. In some cases it can start to provide insights into why a test performed in the way it did.
Countering the Novelty Effect
Sometimes you may see that the variation that won the A/B test does not perform well when deployed to production or on follow-up validation tests. This could be due to the novelty effect. That’s when the novelty of your changes (look, a bigger blue button!) brings more attention to the variation. With time, the lift disappears because the change is no longer novel.
Segmentation can be a great way to thwart the novelty effect. For example you can segment your visitors into new and returning visitors and compare the conversion rates. If it’s just the novelty effect, the new offer will win with new visitors. Eventually, as returning visitors get accustomed to the new changes, the offer will win with them, too.
Segmenting Your Data: Before or After Your Test?
The other thing to consider when segmenting is whether to design a test with segmentation in mind or define the segments after an experiment.
If the test was designed with a segment in mind, the setup usually has enough sample size and hence enough statistical power to allow a decision. If, on the other hand, you uncover some patterns or slicing by slicing and dicing random segments, it becomes important to ensure that enough power exists. If not, then you have to either extend the test duration or go for a re-test.
Segmenting tests from the beginning doesn’t always help with your discovery process. The goal of a test is to figure out which segments respond to which treatments, and often that’s hard to do if you divide them before you even start testing. If you don’t know beforehand that you’ll be breaking up your results into segments, launch follow-up tests for specific, well-performing segments until you get a proper sample size to figure.
Post-test segmentation can help identify these patterns and signals, which feeds into pre-test planning and potentially testing specific areas with appropriate context and sample planning. You can then plan and run completely different tests with device-specific hypotheses, concepts, and sample sizes to account for the different levels of noise, effect size, and user motivation.
When the experiment is run with segments in mind from the beginning (which is usually the case with follow-up tests), you can create reporting with these segments in mind. For example, in Google Analytics, you can create different segments for each variant of the test and build reports for them.
Segmentation is A/B testing taken to a whole new level. When you segment your testing efforts, you add a layer of accuracy and thoroughness that is simply not possible in a haphazard split testing world.
The pay-off is that you have a greater chance of successful testing. You possess a clear understanding of where the visitor comes from, what the visitor’s intent is, and how to test that visitor’s behavior. Your A/B testing becomes exponentially more valuable because every experiment generates a wide range of secondary insights. These can be used to create follow up experiments, identify pain points, and create a better understanding of how customers use your products.