In information retrieval, ranking is of central importance. Ranking is usually done by applying a ranking function (a ranker) onto a set of objects (e.g., documents) to compute a score for each object and sort the objects according to the scores. Depending on applications the scores may represent the degrees of relevance, preference, or importance. Traditionally only a small number of strong features (e.g., BM25 and language model) were used to represent relevance (or preference and importance) to rank documents. In recent years, with the development of the supervised learning algorithms such as Ranking SVM and RankNet, it has become possible to incorporate more features (strong or weak) into ranking models. In this situation, feature selection has become an important issue, particularly from the following viewpoints.
Learning to rank for web search relevance largely depends on the document feature set that is used as training input. First, the trained model is bound to be biased by the choice of features. The feature selection may significantly affect the accuracy of the ranking. For example, although the generalization ability of Support Vector Machines (SVM) depends on the margin which does not change with the addition of irrelevant features, it also depends on the radius of training data points, which can increase when the number of features increases. Moreover, the probability of over-fitting also increases as the dimension of feature space increases, and feature selection is a powerful means to avoid over-fitting. Secondly, the dimension of the feature set also determines the computational cost to produce the model. In the case where not all features in the set are carefully hand-designed, it is even more important to select a feature set of manageable size that can produce a ranking with good performance.
For example, MSN Live Search employs RankNet for ranking, with document features as input. The more features it employs, the more time consuming it is to train a ranking model. In addition, the presence of weak features may have the adverse effect of over-fitting the model. Especially, there is a high chance of such occurrence when the feature set includes a large number of low-level features, as is presently the case. Therefore, it is very important to select a good set of features for RankNet training.
FIG. 1 is a block diagram showing an example of an existing feature selection procedure. Currently, the feature selection is done manually as represented in manual feature selection 110. A training data set 102 is used for manual feature selection 110. Through human decisions (112), a set of features (114) is chosen and passed through RankNet training process 116. The resultant RankNet model 118 is then fed to an automated evaluation tool (120) to determine its performance. Typically NDCG (Normalized Discounted Cumulative Gain) is used as the performance measure. Based on the performance, a decision (122) is made to either further tune the feature set or output a satisfactory selected feature set 130. To further tune the feature set, the process returns to block 112 repeat the decision process, again manually.
The output selected feature set 130 is input to a RankNet training process 140, which also uses training data 102. Input transformation block 142 transfers the selected feature set 130 into input features 144 for RankNet training engine 146, which outputs a RankNet model 148 to be used as a ranking function to rank objects (e.g., documents).
The above menus feature selection 110 is a tedious, time-consuming process that requires a lot of intuition and experience. Even an experience trainer might spend several weeks to tune a feature set and still not sure whether the tuning is successful. It becomes an even greater problem as training data are constantly updated, often adding new features to be evaluated.