Document review is an activity frequently undertaken in the legal field during the discovery phase of litigation or regulatory investigations and requires reviewers to assess the relevance of documents to a particular topic or discovery request. Based on the relevance of the documents, a classification code can be assigned. The classification codes can include “responsive,” “non-responsive,” and “privileged” codes, as well as codes for specific substantive issues. A “responsive” document includes text that is related to or responsive to the particular topic or issue, while a “non-responsive” document fails to include such text. Meanwhile, a “privileged” document contains information that is protected by a privilege, meaning that the document may be withheld from an opposing party. Disclosing a “privileged” document can result in a waiver of privilege to the specific document or its subject matter.
As the amount of electronically-stored information (ESI) increases, the time and expense for conducting a document review also increases. Typically, document review is undertaken manually. However, with the increasingly widespread movement to ESI, manual document review is no longer practicable since reviewers are unable to review, analyze, and assign a classification code to each individual document for large amounts of information.
Conventional methods for enhancing efficiency by identifying relevant documents exist. For example, in U.S. Patent Application Publication No. 2007/0288445, to Kraftsow, a search for relevant documents is performed using a plurality of query terms. Once applied, those documents that satisfy the search query are identified as relevant and a probability of relevancy is determined. A threshold is applied to the probabilities and those documents associated with probabilities that do not satisfy the threshold are removed from the responsive documents subset. However, the responsive results fail to consider responsive documents that are the same as or similar to a particular issue.
Further, in U.S. Patent Application Publication No. 2008/0189273, to Kraftsow, a query for conducting a document relevance search is automatically generated. A reviewer highlights relevant language in one or more documents. The language highlights are analyzed to identify idioms, which are removed prior to query generation. Also, known phrases and parts of speech are identified for use in generating the query. Upon generation, the query is submitted to a Boolean search engine for conducting the search. However, only a single result set is identified, rather than different levels of search results.
Additionally, in U.S. Patent Application Publication No. 2010/0198802, to Kraftsow, search queries are automatically generated. A first set of terms is created from which any matching terms are removed. Next, a second set of terms is created from the first set of terms by removing idioms from the first set. Subsequently, a third set of terms is created from the second set by identifying parts of speech in the second set. Finally, the search query is generated from the third set of terms and a search is conducted using the query. The search identifies a single set of documents that satisfy the query, rather than multiple sets of results, which vary in similarity to a previously classified document.
Thus, there remains a need for efficiently and accurately decreasing the time and expense needed to conduct information retrieval, such as during a document review, by propagating information, including marking decisions, from one or more documents to related documents.