AB Split Testing Crash Course

A/B split testing is the most basic landing page optimization method available. The name comes from the fact that two versions of your landing page (“A” and “B”) are tested. “Split testing” refers to the random assignment of new visitors to the version of the page that they see. In other words, the traffic is split and all versions are shown in parallel throughout the data collection period (usually in equal proportions). This is an important requirement. Parallel tests should always be conducted (as opposed to one-after-the-other “sequential” ones). This allows you to control as many outside factors as possible. The random assignment of new visitors to particular landing page designs is also critical, because randomness is the basis for the probability theory that underlies the statistical analysis of the results.

Usually, version “A” is defined as your original control page, or baseline (commonly called the champion version). The other version is the alternative (commonly called the challenger). If the challenger proves to be better than the champion, the challenger replaces the champion after the test and becomes the new champion to beat in any subsequent tests.

You can have more than two versions in a split test. For example, if you had one original and two alternative versions, you would have an A/B/C split test, and so on. In practice, split tests rarely have more than 10 versions of the page. The variable in your split test can be very granular (e.g., a single change such as headline text), or it can be a whole-page redesign of your landing page that is radically different than the current page.

A/B Split Testing Advantages

Split tests have several advantages:

Ease of test design. Unlike more complicated multivariate tests, split tests do not have to be carefully designed or balanced. You simply decide how many versions you want to test, and then split the available traffic evenly among them. No follow-up tests are required to verify the results – the best performer in the test is declared the winner once enough data is collected.

Ease of implementation. Many software packages are available to support simple split tests. If you are testing granular test elements, you can design, set up your test, and collect data within minutes. This can be done in most cases without support from your IT department or others. You may even be able to collect the data you need with your existing Web analytics tools, without the use of additional landing page testing tools.

Ease of analysis. Only very simple statistical tests are needed to determine the winner. Basically, all you have to do is compare the baseline version to each challenger to see if you have reached your desired statistical confidence level.

Ease of explanation. No complicated analyses or charts are needed to present your results to others. You can simply declare that you are very confident that a particular version is better than another. You can also give a likely range of percentage improvement (based on the amount of data you have collected and the width of the error bars).

Flexibility in defining the variable values. In whole-page split tests, you have complete flexibility in how different the proposed alternatives are. For example, in one alternative, you may simply choose to test a different headline. In another you may completely restructure everything about the page (layout, color scheme, sales copy, offer, and call-to-action). This ability to mix and match allows you to test a range of evolutionary and revolutionary alternatives in one test, without being constrained by the more granular definition of variables in a multivariate test.

Useful in low data rate tests. If your landing page only has a few conversions per day, you simply cannot use more advanced tuning methods. But with the proper selection of the test variable and alternative values, you can still achieve significant results in a split test. Improvements in the double or even triple digits are not uncommon.

A/B Split Testing Disadvantages

Split tests also have several drawbacks:

Limited number of versions. The number of versions in a typical split test is usually very small. If you did your homework properly, you probably came up with dozens of potential issues with your landing page, and also constructed many alternative variations to test. However, because of the limited scope of split testing, you will be reduced to testing your ideas one at a time. You will also be forced to guess which ideas to test first (based on your intuition about which ones might make the most difference). In other tuning methods, you may be able to test many of your key ideas at once and find all of the changes that improve your conversion rate in one test.

Does not consider context or variable interactions. By definition, split tests consider only one variable at a time, so you cannot detect variable interactions (how combinations of variables influence each other). A series of split tests covering several variables is not the same as a multivariate test with the same variables. Depending on the variable interactions, you may not be able to find the best-performing combination of variables on the page at all. Whether you do depends on the order in which you conduct your split tests, and the exact nature of the interactions.

No way to discover the importance of page elements. Often, you may choose very coarse variables for your split test. Because of the limited data rate, you are forced to make your best guess at page elements that might improve performance. These elements may actually involve many simultaneous changes to your landing page. In the extreme case of a whole-page redesign, you may have changed dozens of details on the page in question and defined them as a single alternative version.

However, the same flexibility that allows you to do this also limits your ability to interpret results and attribute credit for the conversion improvement to any particular change that you made. Was it the button color? Or was it the headline change? Or was it the different offer? You will never know. By squashing multiple changes into one page, you have confounded their effects and lost the ability to look at them separately.

In practice, this may not be such a huge issue, since many of the so-called learnings about the relative importance of variables are based on the spurious assumption that they are all independent of each other. Furthermore, the biggest conversion improvements may be due to the specific variable values you have chosen, and not the variable itself. For example, a particular headline that you chose to test was very powerful. But this does not allow you to generalize about headlines being more important than the other variables tested. In any case, you should avoid trying to interpret split test results if the variable values involve changing multiple elements on the page.

Inefficient data collection. Multivariate tests are often carefully constructed in order to get the most information from a smaller data sample. In effect, they allow you to more efficiently conduct multiple split tests simultaneously, and even to detect certain kinds of variable interactions. Conducting multiple split tests back-to-back is the most wasteful kind of data collection – none of the information from a previous test can be reused to draw conclusions about the other variables that you may want to test in the future.

This article originally appeared in Tim’s ClickZ column April 27, 2010

Take your conversions to the next level.

Learn how our experts at SiteTuners can help kickstart your conversion rate optimization process or get better results from your CRO efforts. Give us 30 minutes, and we’ll show you a roadmap to your digital growth!

Schedule A Call Now!