1. Field of the Invention
The present invention relates generally to efficiently optimizing multivariate functions created from large data sets and, more particularly, to systems and methods for efficiently optimizing very large logistic regression models used in a ranking function used to rank documents.
2. Description of Related Art
Generally speaking, search engines attempt to return hyperlinks to relevant web documents in which a user may be interested. Search engines may base their determination of the documents' relevancy on search terms (called a search query) entered by the user as well as additional non-query related features such as geographical location, language, etc. The goal of the search engine is to provide links to high quality, relevant results to the user based on the search query and additional information. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web documents. Web documents that contain the user's search terms are “hits” and are returned to the user. The search engine often ranks the documents using a ranking function based on the documents' perceived relevance to the user's search terms. Optimization techniques may be employed in determining this ranking function.
Efficiently optimizing models of large amounts of information however, such as data on the World Wide Web (“web”), is a challenging task. One requirement for such optimizations is that the resulting optimization converge rather that diverge. Unfortunately, it has been found that, for certain optimization tasks, variables to be optimized share some relationship or interaction with one or more additional variables. Accordingly, convergence of such tasks may only be guaranteed when the variables are optimized one at a time, so as to eliminate the possibility of divergence.
For very sparse problems, one can optimize non-interacting variables concurrently. However, this approach does not work well when the optimization is distributed. Additionally, naive implementations may optimize a small number of weights at once, controlled by a parameter. This approach can work for specific settings of the parameter controlling the number of rules. Unfortunately, it isn't possible to predict what value is right, and a future data may cause divergence. Additionally, because efficiency hinges on the parameter, it tends to be set as high as possible, making the system more likely to fail.