The use of “A/B” testing as a form of experimentation to evaluate proposed changes to user interfaces of remote computing device portals (e.g., websites, etc.) provided by servers that provide various services to remote users has become commonplace. In such testing, alternate versions of a user interface of a portal are created, usually including an existing version (e.g., a control or baseline version) against which one or more proposed new versions are to be tested on randomly selected users. As the server providing the services is accessed by remote users via their computing devices, the server randomly selects one of the versions of the user interface to provide to each of those users. Metrics important to the providers of the portal are computed from the observed behavior of users exposed to various versions of the user interface, and those metrics are then analyzed to determine whether those proposed changes to content presentation positively or negatively influenced a desired activity on the part of those users.
As those familiar with such A/B testing will readily recognize, the need to obtain a large enough sample size of users exposed to each of the different versions of the user interface to obtain statistically significant results often requires running such a test for multiple days, possibly multiple weeks. Where the services provided by a server are in a competitive marketplace such that there is a desire to quickly adopt any improvement in the user interface that demonstrates positive results, or where there are numerous proposed changes to the user interface to be tested, there may be a strong desire to end a current A/B test that is underway early and to begin a next A/B test as soon as possible. The need to obtain sufficient samples to be able to obtain statistically significant results is usually at odds with this desire to proceed more quickly.
However, there can be instances where the resulting user behavior recorded by the samples varies sufficiently to enable the discovery of statistically significant results without the need to await collection of an originally selected number of samples. Unfortunately, determining whether or not such a situation has occurred in a given A/B test such that it can be ended early requires some amount of statistical analysis to be performed on a recurring basis using one or more statistical analysis techniques. Such recurring performance of such calculations can be time consuming, and the moment when sufficient samples are determined to have been taken may occur at an odd hour of a day or night, such that no one is available to perform the statistical analysis to discover this fact or to act upon this fact until a following day. It is with respect to these and other considerations that the techniques described herein are needed.