In determining whether there is a statistical distinction between a given option (e.g., an existing website design) and an alternative option (e.g., a new website design) A/B hypothesis testing can be utilized. For example, consider an online retailer that is trying to determine which of two layouts for a website provides for more completed transactions, or a higher dollar amount for each transaction. In A/B hypothesis testing the two layouts can be distributed equally to visitors of the online retailer's site. Then the visitors' interactions with each layout can be monitored for feedback such as, whether the visitor made a purchase or an amount of each visitors purchase. Based on this feedback one of the two designs that exhibits better performance can be selected via A/B hypothesis testing.
One manner of implementing A/B hypothesis testing is through a fixed-horizon configuration where a total amount of feedback needed to conclude the test is determined prior to implementing the A/B hypothesis test. Alternatively, an A/B hypothesis test could be implemented in a sequential configuration where a determination is made as to whether to conclude the test for each piece of feedback collected. In some instances, multiple alternative options may need to be tested against the given option. Such instances are referred to as multiple hypothesis tests. As an example, consider the online retailer discussed above, suppose the online retailer instead has numerous alternative website layouts that need to be tested. In such an example, a multiple hypothesis test could be utilized to determine which one of the numerous alternative website layouts achieves the most desired results for the online retailer. In fixed-horizon multiple hypothesis testing, a multiple hypothesis test is run until a total number of samples, referred to as the horizon, has been collected. The horizon can be determined, at least in part, to guarantee a desired level of statistical error. Once the horizon is reached p-values can be computed for the hypothesis tests of the fixed-horizon multiple hypothesis test. Various algorithms can then be utilized that take these p-values as input and determine which of the multiple hypothesis tests should be rejected (i.e., which of the respective null hypotheses should be rejected).
Fixed-horizon hypothesis testing has several drawbacks. A first drawback of fixed-horizon hypothesis testing is that it is desirable for the tester to be able to view results of the test as the feedback is collected and analyzed. As a result, in some instances, the tester may prematurely stop a fixed-horizon hypothesis test upon erroneously confirming or rejecting the null hypothesis based on observed feedback. By stopping the test early though, the tester has circumvented the statistical guarantees provided by the fixed-horizon hypothesis test with respect to the desired level of statistical error, mentioned above. This is because the desired statistical error is not guaranteed without reaching the number of samples defined by the fixed horizon. Another drawback is that the fixed-horizon is based at least in part on estimates made by the tester for baseline statistics and minimum desired effects, which may not be accurate and may be difficult for an inexperienced tester to accurately estimate.