In some approaches to statistical modeling, a single model is used to predict the probability of a given event based on previous events. When such single model approaches are applied to online advertising, the single statistical model is used to predict the probability of click based on a user, a query, and an advertiser. However, such a single model may not be able to fairly characterize all predictive sources observable in the data. The click-through rates (CTRs) vary among queries depending upon, for example, the commercial nature of the query. Similarly, some statistical models are biased, if merely because of the selection of predictive sources (i.e. features) used in a particular model. In the context of online advertising, one goal of user response modeling is to predict the user response c (c=1 for click, and c=0 for no click) when the user is presented with an advertisement (e.g. in a search results page). Online advertising systems often extract a variety of features (such features denoted x) from the query, advertisement, user, and location to predict the probability of a click. One desired result of online advertising user modeling is to reliably predict the probability p of a click c, based on feature set x; that is, to calculate probability p(c|x), p being the probability of a user click response based on the constituents of x. There are a number of different approaches to construct and train a predictive model p(c|x). Modeling techniques include maximum entropy (ME) models, models involving neural networks, models involving support vector machines, models involving boosted decision trees, models involving analysis and weighting based on clustering features, models using linear interpolation, models using minimum combinations (discussed below), and models using maximum combinations (also discussed below), among other modeling techniques.
Further, there are many situations where, if a particular event is prevalent (e.g. a query-advertisement pair resulting in a click), reliable estimates of the probability of click can even be extracted from the empirical averages. There are also situations where different models use disjoint sets of features or predictors, such as relevance models that rely only on syntactic features. Still other models involve empirical click-through data (e.g. click-through rate, CTR, data, etc) organized into models using the historical empirical click-through data.
What is needed is a way to define and train a set of predictive models, capture predictions corresponding to the predictive models (i.e. the predictive models being members of the set of predictive models), and then to combine those predictive models in such a manner that the combined predictive model reliably yields predictive estimates of occurrence of events that are at least as good as the best predictive model in the set, or better.