1. Technical Field
Present invention embodiments relate to electronic discovery, and more specifically, to the automated collection and preservation of information from electronic data repositories.
2. Discussion of the Related Art
Electronic discovery or eDiscovery refers to legal discovery for civil litigation for which the information to be “discovered” is in electronic form. Usually, the legal team will designate the relevant data (e.g., emails, instant messages, documents, etc.). Information Technology (IT) administrators will subsequently locate and connect any data repositories for which discovery is to be performed (e.g., by using digital forensics analysis, etc.).
Traditional techniques for eDiscovery use separate processes for eDiscovery Management Application (EMA) functions and certain search engine functions (e.g., database search engines that search electronically stored data). In order to begin the eDiscovery process, the legal team identifies the data repositories to be searched. The IT administrator subsequently connects the search engine to the data repositories and the data repositories are indexed by the search engine for use with the EMA as a separate administrative task. The indexing of the data repositories is meant to be performed well in advance of users doing any search tasks.
Typically, the IT administrator will direct or point the search engine to multiple data repositories, and the search engine performs a full indexing of the entire repository. Existing integrations between EMAs and search engines are implemented with this design in mind. In other words, the EMA only provides streamlined workflow if the search engines are already pre-connected to repositories and content of the repositories is already pre-indexed. If there is any deviation from this scenario, the legal team has to manage the connection process by requesting that the IT administrator locate a repository that was not available or indexed, connect the repository to the search engine, index the content, and perform any necessary steps to enable the EMA to perform automated collections from the newly added repository. The legal team then needs to check the completion status of the task. Once the newly added repository is indexed, the legal team can resume normal execution of a collection request. This process requires disparate teams to work together in a complicated workflow, which is troublesome for the legal team and prone to error. Further, this process results in large and out-of-date indexes, thereby increasing the cost of storage and adversely affect the quality of eDiscovery data. In addition, the EMAs use the notion of a logical data source while search engines require a computer addressable repository address and login credentials, further complicating the typical eDiscovery process.