The effectiveness of an information identification system is measured by how well the system identifies relevant documents within a corpus. Relevance is a property derived from a user and an information need, in other words, a document is deemed relevant by a user if it satisfies that user's information need.
According to conventional information identification systems, the definition of what makes a document relevant or non-relevant exists independently of the information identification system. Conventional information identification systems assume that the user of the system has a preexisting, well-defined and unchanging notion of relevance, and that it is the purpose of the system to identify any documents that are relevant according to that fixed notion of relevance.
For certain types of information needs, the assumption of fixed relevance may be reasonable. For example, in known-item search, the user is attempting to find an item that he or she knows to exist, such as querying a library's search engine with a specific book's title to locate that book within the library.
For more complex types of information needs, the idea of fixed relevance breaks down. For example, a user may approach a search task seeking to resolve an anomalous state of knowledge. In such an example, the user often cannot precisely specify what information is needed to resolve his or her anomalous knowledge-state. In these situations, an exploratory information need exists with the assumption that certain aspects of the information need are initially undefined, and will be further refined through interaction with an information identification system. Even if the user does have a well-defined notion of relevance at the outset, that notion of relevance may change as a user reviews certain documents. For example, documents returned by the information identification system may contain information for which the user was not previously aware, which may, in turn, refine or change the user's notion of relevance. However, conventional information identification systems lack a method of refining a user's notion of relevance in response to information contained in the documents being reviewed or documents that have been previously reviewed.
Conventional information identification systems also operate under the assumption that the user is only interested in a subset of highly relevant documents. For certain information needs, such as the above-mentioned known-item search, a precision-oriented approach is appropriate. In this case, the relevant set usually consists of one document, and therefore a limited search may be effective. However, for more complex information identification tasks, there is a need for the ability to expand the scope of a search. By not expanding the scope of a search, conventional information identification systems fail to identify relevant documents within a corpus.
As a result, there is a need in the art for a method and system to assist in information identification that allows a user's notion of relevance to change and expand in response to information contained in documents being reviewed.