The amount of data available to information seekers has grown astronomically, whether as the result of the proliferation of information sources on the Internet, or as the result of private efforts to organize business information within a company, or as the result of a variety of other cases. As the amount of available data has grown, so has the need to be able to sort and locate relevant data. A related problem is the need to rank data that has been identified as relevant.
When users search data collections for specific data, they typically desire more than a listing of results that simply have some relation to the search query entered. Users generally want to be able to quickly locate the best or most relevant results from within the listing. Ranking the results of the search can facilitate locating the most relevant data. Generally, a high ranking should indicate that there is a high probability that the desired information is present in the search result.
One approach is to use machine learning systems to locate, sort, rank or otherwise process the data. Machine learning systems include such systems as neural networks, support vector machines (“SVMs”) and perceptrons, among others. These systems can be used for a variety of data processing or analysis tasks, including, but not limited to, optical pattern and object recognition, control and feedback systems and text categorization. Other potential uses for machine learning systems include any application that can benefit from data classification or regression.
In general, machine learning systems go through a training phase to improve performance and generate optimal search, sort or ranking results. During a typical training phase, training data is input into a machine learning system and internal system parameters are adjusted based upon the output of the machine learning system and the desired results. The training phase continues until the machine learning system reaches an acceptable level of performance. Generally, increasing the size of the training data set improves system performance, but also increases the time required to train the machine learning system.