Internet searching and browsing has become increasingly common in recent years. Due to the increase in use, many online systems (e.g., search engines) try to predict user behavior in an effort to target a user experience. A few areas where user predictions are increasingly important are advertising, search, online sales, and the like. In advertising, for example, it is beneficial to predict whether a user will select, or “click”, an advertisement. Click prediction, or click probability, is difficult to estimate as it requires semantic understanding and processing of very large amounts of data.
Previously, high-capacity models such as high-capacity linear models have been used to predict click probabilities. These high-capacity models include a separate weight for each feature value and train all weights simultaneously. For example, an IP address may receive a weight and an advertisement listing may receive another weight. The weight of the IP address and advertisement listing are added to the model and simultaneously trained. This model has a high capacity but requires a substantial amount of time (e.g., four hours) to retrain the model. This operability challenge requires retraining the entire system in order to change the training data. For example, if an outlier needs to be removed from training data, the entire system must be retrained. As immediate revenues and customer satisfaction are tied to the accuracy of a click predicting model, the system should be predictable, robust, and easy to operate.