The following are definitions of terms used in this description and in the field to which the invention relates:
The term “offer” is used herein to denote one of a number of alternatives that may be presented to a potential respondent. Other commonly used terms for “offer” include “option” and “action”.
“Respondent”, or potential respondent usually refers to a person who is expected to respond to an offer.
“Responses” can be in various forms and at various levels. Thus examples of responses include “clicks” on a link on a web page, purchase of a product or other acquisition e.g. within a predetermined time period, and a yes (or no) answer to a question posed or sentence read by a call center operator. These are not limiting examples and others will be apparent to those skilled in the art. Sometimes the term “response” is used to denote a positive response, for example in situations where a negative response to an offer is possible.
“Traffic” is used to refer to calls to a call center, visits to a website and other events that lead to the presentation of an offer to a potential respondent.
An offer is said to be “served” to a potential respondent. The serving of an offer may take the form of presentation of a web page, in which case it is commonly referred to as an “impression”. Other examples of serving of an offer include but are not limited to reading a piece of text (script) to a caller, playing a piece of music such as an advertising jingle and mailing a flyer or advertising material e.g. in paper form.
“Serve fraction” refers to the ratio of the number of serves of a particular offer to the total number of serves of offers in a set of possible alternative offers, e.g. for the same traffic. The serve fraction may be determined over a time period or total number of offers.
“Response rate” is usually measured as ratio of responses to serves of a particular offer, but can also be measured in terms of number of responses in a unit time period, for example if the rate of serve is relatively stable. Number of serves and time period can be considered to be equivalent. Response rate can also be determined as a ratio of positive responses to serves, where negative responses are possible, or a ratio of positive responses to a total of non-responses plus negative responses.
“Standard error” StdErr is a well-known statistical parameter and may be used for example as a measure of confidence in a calculation such as a calculation of response rate. Where several calculations are performed a standard deviation may be determined, with the standard error being related to the standard deviation StdDev by the equation:StdErr=Stdev/sqrt(n),where n represents the number of calculations used to determine the standard deviation. Thus the standard error decreases as sample size increases.
The serving of an inferior offer, e.g. one believed to have a lower response rate than another, which may be done for example to check that it is still indeed inferior, is referred to as “exploration”, whereas other serves are referred to as “exploitation”.
“Conversion” is one of several kinds of response that may be measured and is generally used to refer to the serving of an offer resulting in a sale of a product.
A “user” as referred to herein may be the owner of a website, operator of a call system or operator of any other system in which it is desirable to optimize an offer.
“UCB” is an acronym for upper confidence bound.
Processes in which two or more offers are tested to determine which performs best include so-called “A/B testing”. A/B testing can be used anywhere to establish the best performing of two or more offers. Examples of offers for which A/B testing can be used include:                design options for a webpage,        different calls to action presented on a webpage        scripts used in a call centre        pieces of music, e.g. used in audio advertisements        flyers and advertising material in paper form        other situations where one of several alternatives may be presented to a potential respondent in any way including but not limited to visually and audibly.        
A/B Testing software solutions have been available for a number of years. A simple example is shown schematically in FIG. 1 for two offers only denoted as Version A and Version B. A standard A/B testing procedure may operate as follows:                Run experiment, splitting the traffic equally between offers A and B—as a result 50% of potential respondents are presented with version A and the other 50% are presented with version B        Continue for a sufficient time to ensure results, e.g. conversions of presentations or serves of offers into sale of a product, are going to be statistically significant        Validate results, e.g. analysis by human        Present the “winner”, e.g. the version with the highest conversion rate, to all future potential respondents.        
It will be appreciated that the example of FIG. 1 can be scaled up for a larger number of versions than two. This is still referred to in the art as “A/B testing”.
Using this simple approach, customers may receive many more serves of an inferior offer than is necessary to test its performance, e.g. response rate, conversion rate or revenue generated. This may detract from the respondent experience and may cost the operating company money, for example in terms of lost sales. Therefore in some circumstances it may be desirable to reduce the number of serves, e.g. presentations, of an inferior offer, e.g. one with a lower conversion or other response rate, as compared to the simple test illustrated in FIG. 1. For some applications of A/B testing, a goal may be to present an offer no more than the minimum number of times needed to establish that it is indeed inferior.
“Multi-armed Bandit” testing provides an efficient method of reducing the number of serves of an inferior offer. The multi-armed bandit problem is well known in probability theory and is named after the problem faced by a gambler deciding which of a row of slot machines (such as one-armed bandits) to play and how many times to play each one. The problem can be mathematically modeled and the model may be used in a variety of applications other than gambling, for example the serving of offers as discussed above. Thus the term “multi-armed bandit” is commonly used and is used herein to refer to a mathematical model of the multi-armed bandit problem.
FIG. 2, FIG. 3 and FIG. 4 illustrate a process using a multi-armed bandit for A/B testing. As shown in FIG. 2, a testing system may start by splitting traffic randomly, usually equally between the two or more offers. As the bandit receives feedback about the performance of each offer, it will start to favor one more than the others. Thus, FIG. 3 shows version A receiving less traffic than version B because it has a lower conversion rate. The bandit may use many different algorithms to do this, but each will be trying to reduce the number of times it chooses an inferior alternative, over the long run. As the bandit receives more feedback it will start to converge, choosing one alternative almost exclusively, as shown in FIG. 4.
One known multi-armed bandit algorithm is termed UCB and relies on the use of a UCB to determine which of a number of alternatives to serve. Using such algorithms enables the finding of the best offer to be fast and automatic.
The process shown in FIGS. 2-4 does not take account of the possibility that the performance of one offer versus another may change over time. This is addressed to some extent by the continued use of exploratory serves after the identification of the best performer. However there is room for improvement in the use of exploratory serves whilst minimizing serves of offers that do not achieve the best response rate.