In recent years, controlled experiments, also known as A/B testing, have become the state-of-the-art technique for improving online services based on data-driven decisions. Modern online companies such as Microsoft, Google, and Yandex use online controlled experiments ubiquitously to improve their services, including the presentation of search engine result pages (SERPs), by means of data-driven decisions based on the results obtained from such experiments. The largest web service companies have designed special experimental platforms that allow them to run A/B tests at a large scale.
A controlled experiment compares two variants of a service at a time by exposing the variants to two user groups and measuring the difference between them in terms of a key metric, typically a user engagement metric in the context of online services (e.g., the number of visits, the number of clicks, etc.). The ability of the metric to detect a statistically significant difference when the treatment effect exists is referred to as the sensitivity of the experiment. Online service companies are continuously striving to improve the sensitivity of their controlled experiments as greater sensitivity provides more meaningful results, which in turn allows companies to improve their services more efficiently and effectively.
The state-of-the-art approach to improving the sensitivity of controlled experiments is to increase the amount of the observed statistical data, which can be done either by increasing the population of users participating in the experiment or by extending the duration of the experiment. However, both of these approaches have significant disadvantages. First, the population of users is limited by web service traffic, and thus it may not always be feasible to increase the amount of users participating in an online controlled experiment. Second, increasing the length of a controlled experiment reduces the amount of experiments that can be conducted within a given period of time, which is particularly problematic given that controlled experiments are usually conducted to evaluate a new feature or update to a service and the quicker the experiment can be concluded, the quicker the new feature can be launched or reworked if necessary.