The exemplary embodiment relates to document review and classification. It finds particular application in connection with the review of litigation documents and will be described with particular reference thereto. However, it is to be appreciated that it is also applicable to other supervised document review applications.
In the litigation context, large numbers of documents are often reviewed in the course of discovery. These documents may be in electronic form or, if not already in electronic form, may be scanned and then reviewed by legal counsel to identify responsive documents (documents which are responsive to a particular discovery request from an opposing party). Counsel may also review the responsive documents to identify privileged documents (documents for which a privilege can be asserted, such as the attorney client privilege or attorney work product privilege). These documents are flagged and are not initially provided to the opposing party.
Many computer-implemented tools have been developed to provide support for electronic discovery. Contextual review is an example: this technology uses the context within a document and/or between different documents to help reviewers determine the relevancy of the textual content. A very basic approach has also been to apply keyword search to databases of imaged data (e.g. tiff files) and text files. However this approach, even with advanced boolean rules and constraints, is known to have its limitations. (For example, important documents may not contain the selected keywords while many other documents of no interest may contain several. The search is sometimes improved by using “ontologies” or “word communities.” An ontology can describe a particular industry-specific vocabulary by capturing information about the words and phrases that model a particular area of knowledge.
More effective approaches use statistical techniques to determine which documents are “similar” according to specified criteria (rule-based approach) or exemplars (machine learning approach) and to group them together.
Because of the large number of documents which may need to be reviewed in a litigation matter, a team of reviewers may be enlisted to review the documents. Each reviewer is provided with a set of the documents and reviews each of them on screen to determine whether the document is responsive/privileged or not. The process is subject to errors, in that some reviewers may not have the skills required to make a decision in all cases. Additionally, different reviewers may apply somewhat different standards of review. Further, even the most competent reviewers are prone to occasional accidental errors. Existing methods may provide for more than one reviewer to review each document. However, this adds to the cost and time required for completing the review process. Accordingly, a random sample of the documents may be subjected to a double review and if no errors or inconsistencies in the labeling are found, the rest of the documents are assumed to be correctly labeled.
Some tools for electronic discovery address the support of review coordinators with increased control over the review. However they are more focused on ensuring that the review is completed on time and within budget, than on providing means for automating the quality assessment and evaluating the reviewers results. These tools often provide information about how much of the collection has been reviewed, and how much remains to review. Review coordinators can then make adjustments regarding allocation of resources in order to complete the review within the time available. It is also possible to monitor the speed of individual reviewers, tracking how many documents each reviewer has processed. However with these tools, double review through sample checking is still needed for assessing the accuracy of the reviewers.
The work of the reviewers is generally monotonous, and this, when combined with often poor working conditions of the review teams, means that the work is likely to be highly error prone. However, people remain central to this work as there is not currently, nor in the foreseeable future, any technology which can understand the semantic content of documents fully, automatically and accurately. Unfortunately, the very reason that people are hired—their intelligence and ability to understand the semantic content of the documents and the contingencies of the legal case—is somewhat muted by the conditions in which they work and the technology that they use. Thus there is a need for a system that capitalizes on the intelligence of the reviewers while supporting their work with document content analysis capabilities and new interface technology, giving them more freedom and elements to organize their work and drive their annotation process.