1. Field
The methods and systems disclosed herein relate generally to e-discovery technology for electronically stored information (ESI) and particularly to methods and systems for analyzing and detecting electronic documentation bearing a similarity, match or duplication.
2. Description of the Related Art
For organizations around the world, electronic document analysis, retrieval, categorization and storage is a labor intensive and increasingly costly element of conducting business. For example, businesses involved in litigation are often called on to identify and produce information relevant to the litigation, a process which can be extremely time consuming and expensive. The parties may be required to review millions of electronic documents to determine relevance, privilege, issue coding, and the like. Typically this involves a substantial expense for the parties due to the time and effort required to review these electronic documents.
The review may involve manually sifting through electronic documents and classifying them as, for example, relevant or non-relevant to an issue based on the content of the documents. Existing methods and systems employ methods for automating the review process through techniques such as keyword matching and the like. While such techniques may assist in the process for determining relevance, they typically do not work reliably or efficiently for detecting texturally identical or similar electronic documents. Duplicate, or near-duplicate electronic documents may form between 25 and 50 percent of the total documents in a typical electronic documentation of a business enterprise's corpus. Thus, the typical electronic document review process involves significant duplication of effort due to the presence of such duplicate documents, and the computational and analytic burden produced by such electronic document redundancy may slow the processing time of an electronic discovery platform, resulting in unnecessary document review and higher costs.
Therefore, there exists a need for a system and method for enhancing the efficiency of the review process by implementing reliable and effective techniques for identifying texturally identical or similar electronic documents within an electronic discovery analytic platform.