1. Field of the Invention
This invention relates to systems, apparatuses, methods, and computer program products relating to profiling and processing of electronically stored document data. More particularly, the invention relates to data that may need to be produced by a party during a discovery phase of litigation, where the processing includes converting printable files to images, supported by meta-data, and one or more searchable text files.
2. Discussion of the Background
Computer-based discovery in legal proceedings is becoming more and more widespread as tools providing cost effective and legally sound data discovery of electronic information are being developed. An overview of computer-based discovery in federal civil litigation is provided in a Federal Courts Law Review article by Kenneth J. Withers, entitled Computer-Based Discovery in Civil Litigation and dated October 2000, the entire contents of which are incorporated herein by reference. This article notes how discovery is changing in response to the pervasive use of computers and how more and more cases involve e-mail, word processed documents and spreadsheets, and records of Internet activity. This article discusses some of the potential for computer-based discovery to reduce overall discovery costs and improve the administration of justice. The article also explores the unique problems of computer-based discovery. The appendix to this article provides a checklist of computer based discovery considerations regarding pretrial conferences under U.S. Federal Civil Procedure Rule 16(c).
In conducting computer-based discovery, problems arise with respect to the vast quantities of electronic documents that must be reviewed, whether for a party's document production in a litigation against another party, for conducting an internal investigation, or for satisfying government reporting requirements. A party's ability to manage each matter that can be mission critical depends on how fast it can capture, identify, review, assess, and produce relevant documents. The volume of electronic documents today far exceeds paper documents.
According to a 2000 University of California study by Lyan, P. and Vatian, H., entitled “How Much Information,” (http://info.berkley.edu/how-much-info/) the entire contents of which are hereby incorporated by reference, over 90% of corporate documents are created electronically and an estimated 70% of those are never printed to paper. Additionally, e-mail communication among U.S. employees is approaching 3 billion a day. This has dramatically increased the volume, complexity, and cost of electronic document discovery. Moreover, emailing-employees (custodians) often have multiple data sets contained in multiple messaging systems. Electronic documents, whether e-mail stored on hard drives, backup tapes, etc. come in numerous file types (e.g., MICROSOFT WORD, COREL WORD PERFECT, MICROSOFT EXCEL, LOTUS 123, MICROSOFT OUTLOOK, SYMANTEC ACT, AND MICROSOFT OUTLOOK) as well as numerous versions. These documents are often times encoded and may be virus infected. Often a party is required to produce these vast amounts of electronic documents in paper form, a process that can be unjustifiably expensive without telescoping the retrieval of documents based on relevant issues.
FIG. 1 is a flow chart that illustrates the electronic document legal discovery process common today. This conventional process begins in step S1 with accessing one or more data archives, followed by searching and filtering these archives in step S2 in order to identify documents that may be of interest, and printing these selected files in step S3. In some conventional systems, files of interest are not first converted to images before printing. Typically, the searching and filtering is restricted to parameters such as file-owner, date, destination, or other high-level file meta-data. These files are typically not searched or filtered by size, content for duplication, versions, encryption/encoding, corruption, or viruses. Typically, files printed or converted to images via this process are manually reviewed (at great expense) for relevancy, redundancy, and readability.
As noted previously, many of the printed documents are eventually found to be redundant, encoded, or somehow corrupted and thus illegible. Furthermore, conventional search and filtering processes are rudimentary and result in documents being printed that are not relevant to the legal discovery process. The costs of printing can be exorbitant and costs are greatly increased when review time of legal staff at high hourly rates is added. What is desired, as recognized by the present inventors, is a way to electronically screen, select, archive, search, retrieve, and view documents that are relevant to the legal discovery process while not incurring the large expense of having to convert to images and/or print unwieldy and largely useless and/or redundant materials that have to be reviewed in an inefficient, costly, manual manner.
In addition, conventional systems require the entire contents of an archive to be copied and sent to a remote facility for the above-described conventional file processing of FIG. 1. Thus, the inventors have also recognized economic advantages, operational efficiencies, and enhanced privacy/security associated with having an automated tool that (a) can be hosted at the facility in which the archives are located and (b) can be operated by the people knowledgeable about the content in the local archives.
In addition, conventional systems are limited by their reliance on the file extensions to identify file type (e.g., .doc, .wpd, .pdf). Since an author can change/create a file type, the file extension is not always an accurate identifier of the file type. What is desired is a way to identify file type without only relying on the file extension identifier. Also, once the file type is identified, conventional systems are often characterized as having a single, predetermined method of viewing the text associated with the file. Furthermore, no conventional systems are known to be able to quickly convert a file to an |image|, let alone to a plurality of proprietary image file |types|.
Conventional systems include Daticon's Discovery OnDemand, Merrill Corporation's Discovery Navigator, LSI's Electronicode, Doculex's Discovery Cracker, Pacific Legal's Discover-e Web Respository Solution, Bowne's CaseSoft, Mobious' HardCopy Pro Plus and EDD Workstation, Image Capture Engineering's Z-Print, and Applied Discovery's online review product.
In addition, conventional systems are constrained by not being able to simultaneously conduct a text-based search and a structured-data query (e.g., SQL). This slows the process of electronic discovery and search results assimilation. What is also desired, as discovered by the present inventors is a tool that allows for simultaneous text-based and structured-data searching, data integration, and archiving.