As part of legal discovery, the parties to a lawsuit must produce huge volumes information. See Fed. R. Civ. P. 45(d) (requiring production of documents in response to a subpoena). Document review is a crucial, time-consuming part of litigation and is increasingly becoming the most expensive part of the litigation process. KIKER, Dennis R. ‘How to Manage ESI to Rein In Runaway Costs’. In Law.com, Corporate Counsel [online]. Jul. 18, 2011 [retrieved on 2011-10-06]. Retrieved from the Internet: <URL:http://law.com/jsp/cc/PubArticleCC.jsp?id=1202503308698&src=EMC-Email&et=editoral&bu=Corporate%20Counsel&pt=Corporate%20Counsel%20In-House%20Tech%20Alert&cn=In_House_Tech_20110719&kw=How%20 to%20Manage%20ESI%20to%20Rein%20In%20Runaway%20Costs>. Each party typically makes broad requests for its opponent to produce documents it believes will contain information relevant to its claims and defenses. The rapid escalation of the amount of electronically stored information (“ESI”) being stored and transmitted electronically creates numerous issues such as problems with storage, searching, recall, precision, etc. CORTESE, Alfred W., Jr. ‘Skyrocketing Electronic Discovery Costs Require New Rules’. In ALEC (American Legislative Exchange Council) Policy Forum [online]. March 2009 [retrieved on 2011-10-06]. Retrieved from the Internet: <URL:http://www.alec.org/am/pdf/apf/electronicdiscovery.pdf>. Although computers can handle the bulk of the searching chores, significant human involvement remains necessary. As a result, the cost of discovery is often very high and increasing. Id.
Because of the high cost involved in any legal proceeding involving ESI, which represents the majority all civil and criminal litigations, see PASSARELLA, Gina, ‘‘E-Discovery Evolution’: Costs of Electronic Discovery Are Growing’, In post-gazette.com (Pittsburgh Post-Gazette) [online], Aug. 15, 2011 [retrieved on 2011-10-06], Retrieved from the Internet: <URL:http:post-gazette.com/pg/11227/1166927-499-0.stm>, litigants are more likely to engage in Early Case Assessment (“ECA”). ECA allows the litigants to determine what is contained in their ESI before a broader substantive review takes place. SILVA, Oliver, ‘Early Case Assessment (ECA)—Incorporating ECA into Your Discovery Strategy’. In e-LegalTechnology.org [online]. 2010 [retrieved on 2011-10-06]. Retrieved from the Internet: <URL:http://www.e-legaltechnology.org/member-articles/article-detail.php?id=39>. This is particularly important in determining whether to bring, or how to defend against, potential litigation, all while minimizing costly human review.
The currently available ECA processing tools reflect a traditional, almost paper-based, approach to document reproduction. In a typical paper filing cabinet, all documents may be organized into sequential or linear files based on a particular methodology. If a user is looking for a particular document, the user may find the relevant file and then be required to look through each document in a sequential order in order to find the particular document. Typical ECA processing tools use the same conceptual approach, i.e., a sequential or linear methodology for reproducing and retrieving electronic information.
For example, an email database represents a paper filing cabinet. Each email represents a file, and any documents attached to that email (“attachments”) would be included in the file. The ECA processing tool stores each email as a record and reproduces the email text and any attachments in a sequential order, the same as it would do for paper files.
Unfortunately, electronic messages are no longer confined to such linear or sequential methods of storage. Individual electronic documents may not only be stored after other electronic documents, but embedded within, and linked to, other electronic documents through Object Linking and Embedding (“OLE”), which is a technology developed by Microsoft® that allows embedding and linking to documents and other objects.
Not only must every email or document be reviewed, but the context and relationship of that document must also be preserved. Without knowing the context in which the document was created, its entire meaning is often lost. Even the context of the information within the documents must be carefully preserved so that advanced semantic and linguistic analytical tools can properly evaluate and compare concepts between documents accurately. Therefore, any proper retrieval of a document requires the precise and accurate retrieval of the information in the document and the information about the document. Thus, there exists a growing need to develop methods and systems that can organize and search data in a way that preserves the context of the information and permits review of embedded objects while still maintaining the textual (or substantive), as well as conceptual, information in the proper context.
The present invention provides such a method and system for extracting information or data from documents containing multiple embedded objects. The method and system preserves the overall relationships among documents their embedded objects and allows for rapid and efficient data extraction and analysis for large quantities of data, i.e. terabytes to petabytes.