The present invention relates to data discovery, such as legal data discovery. Organizations today face various challenges related to data discovery. Increased digitized content, retention of data due to regulatory requirements, the prevalence of productivity tools, the availability of data on communication networks, and other factors have been driving rapid growth of data volumes in organizations. In response to the rapid data growth, many organizations have been expanding data storage with various data storage devices and have been implementing data discovery utilizing various tools provided by various suppliers to perform various data discovery tasks. Typically, time scale differences and speed mismatch between the tools and the tasks performed may result in issues such as missed data and latency in responding to data discovery requests.
In general, data discovery may involve tasks such as identification, collection, culling, processing, analysis, review, production, and preservation. Typically the tasks may be performed by different tools provided by different suppliers. For example, the tasks of identification and collection may be performed by an identification-collection tool, and the task of processing may be performed by a separate processing tool coupled to the identification-collection tool. Since identification and collection may be performed substantially faster than processing, the identification-collection tool may unnecessarily collect too much data such that the processing tool may be unable to timely process all the collected data. As a result, a substantial portion of the collected data may be dropped without being processed. Consequently, some critical data may not be appropriately analyzed and preserved. In addition, if the user of the tools expects the data discovery tools to respond to data discovery requests at a speed consistent with the data collection speed, the user may experience substantial latency caused by the delay at the processing tool.
In some arrangements, data may need to be manually transferred between some of the data discovery tools. The manual process may cause a substantial amount of errors in the tools and in the data discovery process.