Information retrieval generally uses ranking for a set of objects (e.g., documents) by calculating a score for each of the objects and sorting the objects according to the scores. Depending on the type of applications, the scores may represent the degrees of relevance, preference, or importance. Traditionally, only a small number of strong features were used to represent relevance and to rank the documents. With the development of supervised learning algorithms like Ranking Support Vector Machines (SVM) and RankNet, there is a possibility to incorporate more features (either strong or weak) into ranking models.
Incorporating SVM into ranking models creates problems. The generalization ability of SVM depends on margin, which does not change with the addition of irrelevant features, and depends on the radius of training data points, which can increase when the number of features increases. The problem with this method is that the probability of over-fitting also increases as the dimension of feature space increases. As a result, over-fitting occurs and there remains a need for accuracy in machine learning. Furthermore, when applying Ranking SVM to a web search, a problem may occur such as training of ranking models cannot be completed in a timely manner or in an acceptable time period.
Other problems that commonly occur with information retrieval, especially in web searching, are that the data size is very large and training of the ranking models is very expensive. Additional problems include noisy features that are not relevant and the amount of time to train and test the ranking models.
Attempts in applying feature selection to ranking have been troublesome. First, there is a significant gap between classification and ranking. In ranking, a number of ordered categories are used, representing the ranking relationship between instances, while in classification the categories are “flat”. Obviously, existing feature selection methods for classification are not suitable for ranking. Second, the evaluation measures (e.g. mean average precision (MAP) and normalized discounted cumulative gain (NDCG) used in ranking problems are different from those measures used in classification. Some of the differences are that precision is more important than recall in ranking while in classification both precision and recall are factors to consider. Furthermore, in ranking, correctly ranking the top-n instances is more important. While in classification, making a correct classification decision is of equal significance for most instances. These problems indicate there is a need for feature selection for ranking.