1. Field of the Invention
The present invention relates to systems and methods involving techniques for review and analysis of content data (in paper or electronic form) such as a collection of documents. More particularly, the present invention relates to a system and method for improved keyword searching of a collection of documents.
2. Background
Search engine technology as improved the ability to quickly find results within a collection of documents as compared to a human needing to review each document within the collection. However, the quality and completeness of search results resulting from such conventional search engine techniques are often indefinite and therefore, unreliable. For example, one does not know whether the search engine used has indeed found every relevant document, at least not with any certainty.
One search engine technique currently used is a keyword search of a collection of documents. A user enters a search query consisting of one or more keywords and the search system uncovers all of the documents that have one or more words of the search query. However, in many cases, such a search technique only marginally reduces the number of documents to be reviewed, and a user cannot usefully examine the large quantities of documents returned.
Many of the documents retrieved in a standard search are typically irrelevant because these documents use the searched-for terms in a way or context different from that intended by the user. Words have multiple meanings. One dictionary, for example, lists more than 50 definitions for the word “pitch.” In ordinary usage by skilled humans, such ambiguities are not a significant problem because skilled humans effortlessly know the appropriate word for any situation. One way to address this issue is to include synonyms of the search terms. For example, “elderly,” “aged,” “retired,” “senior citizens,” “old people,” “golden-agers,” and other terms are used to refer to the same group of people and can be included in a search query to increase the probability of finding the desired result.
However, such a process is useful when a user where a user is primarily concerned with finding any document that contains the precise information the user is seeking. Some applications of keyword searching, for example, discovery in litigation, require a high degree of precision and high recall. To address this issue, techniques have been developed that use lists of synonyms and phrases that encompass every imaginable word usage combination. However, in practice, the total number of documents retrieved by these queries is still quite large and computationally expensive to generate and analyze. So, a user is faced with two issues of identifying enough keywords so that the search may find the document or documents one is looking for, but at the same time limiting the keyword search so that the synonym list that must accompany the search query does not generate a large number of irrelevant documents.
There is a need for improved keyword search techniques that balance the need for the right number of search terms with the need to eliminate search terms that will not add to the effectiveness of the results.