The present invention relates to the field of information retrieval. More particularly, the invention relates to an apparatus and method for score normalization for information retrieval applications.
Information retrieval (IR) systems have been developed that allow users to identify particular documents of interest from among a larger number of documents. IR systems are useful for finding an article in a digital library, a news document in a broadcast repository, or a particular web site on the worldwide web. To use such systems, the user specifies a query containing several words or phrases specifying areas of interest, and the system then retrieves documents it determines may satisfy the query.
An IR system typically ranks documents with some measure (e.g., score) by the likelihood of relevance to a query. The ranking order is useful in determining whether one document is more relevant than another. Most applications, however, have the selection of relevant documents as their final goals. A ranking order by itself does not provide an indication of whether a document is actually relevant to the query. A large number of documents that are low on the ranking order invariably are provided as a result of the query, despite the fact that these documents probably are not very relevant.
In order to make a decision on the selection of documents that are relevant to the query, a threshold on the scores may be utilized. Scores above the threshold are designated as relevant, and scores below the threshold are designated as not relevant. Previous systems generally use an ad-hoc approach to picking the threshold, such as looking at the top few documents in the ranking order and then setting an arbitrary score to be the threshold.
This method of choosing thresholds, however, makes it difficult to come up with a consistent decision threshold across queries, because the scores assigned documents for one query do not generally relate to the scores assigned documents for a different query. This results in a degradation of system performance for the task. The alternative is to set the threshold for each query, but this is impracticable. Accordingly, there is presently a need for a system that normalizes scores so that a decision threshold is consistent across different queries.
A method consistent with the present invention normalizes a score associated with a document. Statistics relating to scores assigned to a set of training documents not relevant to a topic are determined. Scores represent a measure of relevance to the topic. After the various statistics have been collected, a score assigned to a testing document is normalized based on those statistics. The normalized score is then compared to a threshold score. Subsequently, the testing document is designated as relevant or not relevant to the topic based on the comparison.
Another method consistent with the present invention normalizes a score associated with a document. A query that includes a topic is received. Next, statistics relating to scores assigned to a set of training documents not relevant to a topic are determined. Scores represent a measure of relevance to the topic. After the various statistics have been collected, a score assigned to a testing document is normalized based on those statistics.
Another method consistent with the present invention searches for documents relevant to a topic. A query including a topic is sent to a processor. The processor determines statistics relating to scores assigned to a set of training documents not relevant to a topic, normalizes a to score assigned to a testing document based on the statistics, and designates the testing document as relevant or not relevant to the topic based on the normalized score. Results are then received from the processor indicating a document relevant to the topic.
An apparatus consistent with the present invention normalizes a score associated with a document. The apparatus includes a memory having program instructions and a processor responsive to the program instructions. The processor determines statistics relating to scores assigned to a set of training documents not relevant to a topic, the scores representing a measure of relevance to the topic; normalizes a score assigned to a testing document based on the statistics; compares the normalized score to a threshold score; and designates the testing document as relevant or not relevant to the topic based on the comparison.