This application addresses an invention to substantially improve the complex effort of responding to a discovery request, and the demands of performing an investigation. The two halves, which are often performed in parallel we will call review and investigation respectively.
Common legal practice in responding to a discovery request often requires that data pertinent to a matter should be reviewed for relevance and privilege. A common review method is when reviewers annotate items with one or more tags indicating how the content should be categorized. Based on these reviewer categorizations, each item either produced to the counter party, or noted in a privilege log but (generally) not produced, or nor produces because of irrelevance to the discovery request. The traditional process of handling a discovery request is time and labor intensive, and as a result has a high cost. Furthermore, it is extremely difficult to obtain consistent and accurate results amongst reviewers which is a significant problem in itself, but especially when there are a large number of reviewers working to meet a discovery request.
The continuing increase in the amount of corporate data that is necessary to reasonably meet a discovery request is creating an extra burden on the existing art. Therefore it has become common practice to use “keyword culling” to reduce the number of items reviewed. However, keyword culling is extremely inaccurate and other well-known automated categorization techniques have therefore been attempted. Unfortunately, these automated categorization methods are usually overly simplistic and can introduce real risks. Relevance to a discovery request cannot be judged only by the presence of keywords or simple analyses of the data. For example, consider the simple case of an email that in its entirety reads: “Yes, let's proceed”, which could be an authorization to commit fraud or something that is completely innocuous. Nor can relevance be adjudged accurately by statistical categorization methods, since very slight differences in content can make the difference on whether an item is produced or not produced; matters hinging on jurisdictional issues are one of many excellent examples of this.
To improve upon the existing art in a realistic and comprehensive manner, many factors must be taken into account, including:                Requirements for accuracy and completeness are very strict. The consequences of failing to remove material containing confidential or privileged material may be severe. The courts also frown upon “dumping” large numbers documents that are non-responsive to the original request, and can even impose sanctions on this basis.        The categorization requirements are varied and can include “hard” constraints such as conformance to relevant date ranges or custodial ownership, as well as broad references to a general topic—and all points on the continuum in between.        Corpora very often contain multiple foreign languages.        It is very difficult, and sometimes nearly impossible, to quickly and effectively train large numbers of document reviewers on how to interpret detailed and often highly industry specific data.        The task of document review is an extraordinarily tedious one, and reviewers can easily become bored and have their attention drift.        It is therefore necessary to have an objective and rapid means of assessing reviewer accuracy and providing feedback.        Large data files such as spreadsheets or dumps of database contents can confound most automated categorization techniques.        “Short format” items such as email responses or IMs can be sufficiently lacking in content that they require other related items—such as those identified by discussions, in order to accurately assign any meaning to them.        Large corpora are heterogeneous and distributed over items of many different types, from emails and different kinds of short message formats, to typical office and business documents to very large data files.        The invention document herein, and in the parent application accounts for all of these factors in order to help users meet the stringent requirements of a discovery request as efficiently and effectively as possible.        
A first step of handling a discovery request often involves an investigative effort where the party served with a discovery request is interested in making its own conclusions about the matter at hand. It is often important for both review and investigation tasks to be done in parallel for the simple reason that the investigation effort may in some instances dictate that a case should simply be dropped, or that an attempt should be made to settle it based on “bad fact patterns.” While review and categorization of individual items is necessary in order to determine which items must ultimately be produced, it is a much different task than trying to analyze the collective meaning of the data.
Analyzing corporate data for its meaning can quickly provide information about exactly what happened, and who might be important to an investigation effort. In order to support the investigative task, the present invention provides visualization, analysis, and a powerful query engine for many dimensions of actor behavior, with special attention given to how these different dimensions change over time, and may be correlated to one another. In addition, factors such as the emotive tones present in communication, and the apparent avoidance of written communication media are analyzed and visualized.