The amount of data available to information seekers has grown astronomically, whether as the result of the proliferation of information sources on the Internet, or as a result of private efforts to organize business information within a company, or any of a variety of other causes. As the amount of available data has grown, so has the need to be able to sort and locate relevant data. A related problem is the need to rank data that has been identified as relevant.
When users search data collections for specific data, users typically desire more than a listing of results that simply have some relation to the search query entered by the users. The users generally want to be able to quickly locate the best or most relevant results from within the listing. Ranking the results of the search can assist users in quickly locating the most relevant data. Generally, a high ranking indicates to users that there is a high probability that the information for which the users searched is present in the search result.
One approach is to use machine learning systems to locate, sort, rank or otherwise process the data. Machine learning systems include such systems as neural networks, support vector machines (“SVMs”) and perceptrons, among others. These systems can be used for a variety of data processing or analysis tasks, including, but not limited to, optical pattern and object recognition, control and feedback systems and text categorization. Other potential uses for machine learning systems include any application that can benefit from data classification or regression. Typically, the machine learning system is trained to improve performance and generate optimal search, sort or ranking results.
Such machine learning systems are usually trained using a cost function, which the learning process attempts to minimize. Often, however, the cost functions of interest are not minimized directly, since this has presented too difficult a problem to solve. For example, in document retrieval problems, one measure of quality of the trained system is the area under the Receiver Operating Curve (ROC) curve. The ROC curve is a graphical plot of the number of true positives (e.g., relevant documents retrieved), versus the number of false positives (e.g., irrelevant documents retrieved). Such cost functions are not differentiable functions of the outputs of the machine learning systems used, and this lack of smoothness presents difficulties for training using such functions directly.