The present invention relates generally to producing and organizing electronically stored information, and more specifically to more efficiently producing and organizing electronically stored information in an e-discovery process.
E-discovery refers to a process in which electronic data is sought, located, secured, and searched with the intent of using it as evidence in litigation. E-discovery can be carried out offline on a particular computer or it can be carried out on-line where the electronic data can be accessed through a network.
Due to the rising pervasive use of electronic documents in organizations and the relative ease by which electronic documents are handled, there has been a major push to enable e-discovery in standard litigation practices. The nature of modern digital data makes digital documents extremely well-suited to investigation. Compared to paper-based documents, digital data can be searched with relative ease. Digital data is also relatively difficult to destroy. This arises because electronic documents are typically scattered and stored throughout a network during their normal usage. Standard workflow usually dictate that electronic documents are routinely duplicated and spread through out multiple hard drives and computer systems.
In general, electronic data of all types can serve as evidence in the e-discovery process. Standard discoverable electronic data include texts, images, calendar and schedule data, audio files, spreadsheets, animation files, databases, web site archives, even computer programs such as viruses and the signatures they may leave behind. In a modern corporate setting, electronic mail (e-mail) and recording voicemails are becoming an especially valuable source of data.
A problem with processing electronic documents in an e-discovery process involves dealing with the large amount of data. The very same characteristics that make electronic documents robust and durable—the extent by which electronic documents are routinely duplicated and distributed—are the same characteristics that make electronic documents difficult to process. In the electronic data arena, any one electronic document is almost always duplicated many times over and spread throughout various repositories. This duplicity adds an additional layer of challenge would be reviewers must sort through.
Some of the duplicates are exact-duplicates; others are near-duplicates. Examples of exact-duplicates include exact copies of a file kept in several locations by several users. The near-duplicate files range from almost identical files to slightly altered files to files corresponding to completely different formats. Examples of almost identical files include copies of files that are almost identical except for perhaps their metadata. When a document is attached to an email and sent to a person, the document saved by the recipient is identical to the sender's document except the files may specify different creation or modification dates. Examples of slightly altered files include two copies of an email, one original copy, and one forwarded to another person. In this case, the forwarded email may contain much of the same content as the original except for minor formatting changes and perhaps the addition of a new header and some description text. Examples of files in completely different formats is a document saved in word format and the same document saved in pdf format.
As can be seen from the above, because of the volume of electronic data and the duplicity of many electronic documents, organizing and processing electronic data can be a time intensive process. As the cost of litigation continues their dramatic increase, and as the cost related to the discovery of electronic documents remains a major component of litigation costs, there is a need for a method and system for more effectively organizing and processing electronically stored information.