Document review is an activity frequently undertaken in the legal field during the discovery phase of litigation. Typically, document classification requires reviewers to assess the relevance of documents to a particular topic as an initial step. Document reviews can be conducted manually by human reviewers, automatically by a machine, or by a combination of human reviewers and a machine.
Generally, trained reviewers analyze documents and provide a recommendation for classifying each document in regards to the particular legal issue being litigated. A set of exemplar documents is provided to the reviewer as a guide for classifying the documents. The exemplar documents are each previously classified with a particular code relevant to the legal issue, such as “responsive,” “non-responsive,” and “privileged.” Based on the exemplar documents, the human reviewers or machine can identify documents that are similar to one or more of the exemplar documents and assign the code of the exemplar document to the uncoded documents.
The set of exemplar documents selected for document review can dictate results of the review. A cohesive representative exemplar set can produce accurately coded documents, while effects of inaccurately coded documents can be detrimental to a legal proceeding. For example, a “privileged” document contains information that is protected by a privilege, meaning that the document should not be disclosed to an opposing party. Disclosing a “privileged” document can result in an unintentional waiver of privilege to the subject matter.
The prior art focuses on document classification and generally assumes that exemplar documents are already defined and exist as a reference set for use in classifying document. Such classification can benefit from having better reference sets generated to increase the accuracy of classified documents.
Thus, there remains a need for a system and method for generating a set of exemplar documents that are cohesive and which can serve as an accurate and efficient example for use in classifying documents.