Information retrieval systems, such as internet search systems, use ranking functions to generate document scores which are then sorted to produce a ranking. Typically these functions have had only a small number of free parameters (e.g. two free parameters in BM25) and as a result they are easy to tune for a given collection of documents (or other search objects), requiring few training queries and little computation to find reasonable parameter settings.
These functions typically rank a document based on the occurrence of search terms within a document. More complex functions are, however, required in order to take more features into account when ranking documents, such as where search terms occur in a document (e.g. in a title or in the body of text), link-graph features and usage features. As the number of functions is increased, so is the number of parameters which are required. This increases the complexity of learning the parameters considerably.
Machine learning may be used to learn the parameters within a ranking function (which may also be referred to as a ranking model). The machine learning takes an objective function and optimizes it. There are many known metrics which are used to evaluate information retrieval systems, such as Normalized Discounted Cumulative Gain (NDCG), Mean Average Precision (MAP) and RPrec (Precision at rank R, where R is the total number of relevant documents), all of which only depend on ranks of documents and as a result are not suitable for use as test objectives. This is because the metrics are not smooth with respect to the parameters within the ranking function (or model): if small changes are made to the model parameters, the document scores will change smoothly; however, this will typically not affect the ranking of the documents until one document's score passes another and at which point the information retrieval metric will make a discontinuous change.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known information retrieval systems.