Electronic discovery (hereinafter ediscovery) generally refers to an electronic process and system for identifying, collecting, and producing electronically stored information (ESI) in response to a request for production in a law suit or investigation. EST includes, but is not limited to, emails, documents, presentations, databases, voicemail, audio and video files, social media, and web sites.
The processes and technologies involved in ediscovery are often complex because of the sheer volume of electronic data produced and stored. Additionally, unlike hardcopy evidence, electronic documents are more dynamic and often contain metadata such as time-date stamps, author and recipient information, and file properties. Preserving the original content and metadata for electronically stored information is required in order to eliminate claims of spoliation or tampering with evidence later in a litigation or investigative process.
During a legal or other investigative process, once data is identified by the parties on both sides of a matter, potentially relevant documents (including both electronic and hard-copy materials) are placed under a legal hold, and the relevant documents/data cannot be modified, deleted, erased, or otherwise destroyed. Potentially relevant data is collected and then extracted, indexed, and placed into a database. The potentially relevant data in the database is then further analyzed to cull or segregate the clearly non-relevant documents and e-mails. The data is then hosted in a secure environment and made accessible to reviewers who code the documents for their relevance to the legal or investigative matter.
In ediscovery, the relevant documents may be converted to a static format such as tagged image file format (TIFF) or portable document format (PDF), making redaction of privileged and non-relevant information possible. The use of computer assisted review (also known as “C.A.R.” or Technology Assisted Review, “T.A.R.”), predictive coding, and other analytic software for e-discovery reduces the number of documents required for review by attorneys, and allows the legal or investigative team to prioritize the documents it does review. The reduction in the number of documents cuts the number of hours reviewing documents and thus labor costs. The ultimate goal of ediscovery is to produce a core volume of evidence for litigation or investigations in a defensible manner.
Existing ediscovery processes typically require numerous people, computer systems, and software systems for the management and execution of data collection, data processing, preparing for attorney document review, and production of electronically stored information for investigations, audits, litigation, or other legal purposes. Disparate, loosely connected, complex software systems and human interactions are necessary to execute the ediscovery process. The complexity, lack of workflow, and integration between systems and processes in existing ediscovery processes is responsible for enormous delays, frustration, and quality issues.
The typical workflow in existing ediscovery processes are long-running, requiring humans to watch for stages of process completion, and manually having to shuttle data to the next step or software package to continue the process, and continue to move the data to its end point. The existing ediscovery process often requires hiring several shifts of technical employees to enable this manual process to be performed without long interruptions. It is also typical that work for multiple projects may be performed concurrently for different parties of interest, with differing project requirements, priorities and deadlines. Delays are exponentially increased while the data waits to be moved manually to the next step in the process while the human employee is busy with a competing project.
FIG. 1 is a flow diagram showing a graphical representation of an existing ediscovery process 10 with stopping points (S) requiring human (H) user interactions to manually restart the process between process stages 1 to stage 8. At stage 1, a storage medium 12 with electronically stored information (ESI) is received from an end user to undergo ediscovery processing. The media 12 is logged into a chain of custody tool, and a report is then generated about the data received in the shipment. The report will be used to ascertain which files to copy to the network for ediscovery processing. Depending on the client's instructions with respect to the contents of the drive, a technician will copy data from the media to a file server 14. The data copied to the file server 14 is typically the data that has been requested to be processed by the client. Technicians typically use third party copy technology to copy the data to the file server 14. The file copy process is a batch (B) process that requires the copy to fully complete before the next stage in the ediscovery process can be started. Therefore, this process is serial to data processing, and all data must finish copying before the processing stage can begin. The serial nature of the data transfer during the copying stage introduces a data processing delay, and a human intervention step once copying is complete, as shown by the stop sign (S). Secondly, a record of this data transfer and its relationship to the media and the legal matter, must be created at another manual step in yet another separate software system.
Following the completion of the data transfer/copying to the file server 14; the technician at stage 2 must perform another manual step to confirm that all of the data selected to copy actually copied to the file server. The technician may conduct the confirmation process by looking at robocopy logs or by doing a full folder compare using additional software tools 16 illustratively including Beyond Compare by Scooter Software, inc. Once verification of a successful copy process of the required data is obtained, all of the transferred data on the file server 14 can move to the next stage. The transfer/copying process is also serial to data processing and must be done before ediscovery processing can begin, introducing an additional data processing delay, and a human intervention step once copying is completed.
With the data on the file server now ready for ediscovery processing, the technician at stage 3 sets up ediscovery processing jobs using ediscovery software tools. The technician selects individual folders on the file server, and these folders are mapped to jobs to begin data processing. Each job is a batch of data that is processed together. At stage 4, data processing occurs with a selected ediscovery tool 18. All files are broken down into individual documents. Files inside of archives are extracted to disk as individual files. Files inside of mail databases are extracted out into individual files; this includes copying out attachments separate from the email. After all files are extracted to disk, metadata and text are extracted from the individual documents. This extracted information is loaded into the processing database for further formatting. Once all files have been processed the batch of data moves onto the next step. These monolithic batches of data must be completed in their entirety before being manually exported from storage for transmittal to external document review systems. This creates a document review delay and introduces an additional human intervention step. At stage 5 a manual error remediation process is performed that requires the technician to remediate all errors before moving all of data in a batch to the next stage. All of the data in the batch will wait for the technician to look for errors that occurred during processing even though the error documents are a very small percentage of the overall document population. The technician will look at all extraction errors and attempt to remediate as many processing errors as possible. Once all errors have been remediated the technician will send the entire batch of data on to the next stage.
Continuing with the ediscovery process of FIG. 1, once the entire batch (B) of data has been quality controlled (QC'ed) for data processing errors (stage 5), the batch (B) of data is ready for filtering at stage 6. The technician applies filters 20 to the batch (B) of data to create an export set 22 of documents. The filters 20 that are applied at this time include but are not limited to de-duplication, date, file type, file extension, and keyword searching. The application of filters 20 is a manual step causing stoppage of work until configuration and execution by a user prompts the software to continue to the export/transmittal to storage or review systems.
At stage 7, after all errors have been remediated for the entire batch and filters have been applied the technician exports the data for load into a review product. Export will put the each document 22 out on the file server so that the documents 22 may be viewed in the review product. The ediscovery software also produces a metadata load file, and the extracted text for each file it outputs to disk. The technician must QC the exported data 22 after completion to ensure that no errors were encountered. If errors occurred the export will need to be redone in its entirety. At stage 8, data 22 is manually loaded (imported) into the review platform using load files. The entire filtered batch or many filtered batches will be loaded at one time. After load completion the technician completes a QC of the data load to ensure it was done correctly.
Thus while ediscovery can be an effective tool there exists a need for a method and system to improve the ediscovery process that streamlines the process and makes ediscovery less labor intensive and time consuming.