Automate Your A/B Test Processing for High Confidence Results

Posted by | Comments

Tags:


In the eighth of our 12 days of LPO tips, SiteTuners takes you through the steps to determine the impact of testing changes. 

One of, if not the most, problematic things about A/B testing and analytics is that somewhere out there is a marketer, and he will have to process the math.

If Jim has three apples and John …

Okay, it’s time for a few caveats. 

There are a good number of marketers who are great at math, who love math, and who will find the data in this post crude - we apologize ahead. The idea is to get everyone started on at least the basic principles of testing for probability, and that requires a simple model.

Likewise, there are a number of marketers out there who will feel that this is a throwback to school lessons with teachers who have a predisposition to ask about fruit (How many apples does Jim have if he has John’s apple share squared?). Some people went into marketing specifically to avoid the square root - again, our apologies. Conversions require both observed effects (16% increase in conversions for December. Yey!) and underlying systems (The results indicate at 95% confidence that the challenger page will outperform the champion page). 

But first, a few concepts …

At the core of tests, we have the law of large numbers, the empirical rule, and the central limit theorem.

Law of Large Numbers: the larger the sample set, the closer the average of the sample will be to the underlying probability. For the purposes of tests, that is, larger sample sizes make for narrower, more accurate results.
Empirical rule: for normal distributions, 68% of the results will be within one standard deviation, 95% will be within two standard deviations, and 99.7% will be within three standard deviations.
Central Limit Theorem: the average will tend to conform to normal distribution.

Got it? Let’s dive in.

Scenarios

Download this Excel File for Statistical Confidence Computations. Plug in the values as required for cells B2 and B3.

Let’s say you are distributing traffic equally to your champion and challenger landing pages for the test. The champion gets 20 conversions and the challenger gets 40 conversions. A 100% increase if applied! But hold on a second.

Computing for 95% confidence, you’ll see that the champion can get distributions as high as 28.9, while the challenger can get as low as 27.4. So while conversion percentage is arguably double, there’s an overlap, which means more data may be required.

The idea behind the test is that if there’s no overlap between the high value of the champion page and the low value of the challenger page, you’ll have a high confidence test in favor of the challenger.

Let’s try that on a smaller percentage of improvement for a larger sample size. Say your champion gets 167 over a period of time, and your challenger gets 229, for an increase of 37% if applied.

You’ll notice that the high distribution point for the champion is still below than the low distribution point for the challenger. The ranges are more narrow, and the test at 95% confidence  shows the challenger performing better.

Tim’s book delves into more caveats like stable traffic sources and behavioral differences between certain demographics. For now, though, if you don’t have a model for testing at different confidence levels, start using this Excel File for Statistical Confidence Computations.

Next: "Stop Leaving Money on the Table by Finding the Right Tuning Methods." SiteTuners covers the different tuning methods.