Electronic data management, particularly in large enterprise computing environments, is increasingly complicated. For instance, a decreasing cost of electronic storage space in combination with regulatory and legal obligations to retain data has led to exponential growth in data accumulated throughout organizations. Data is often stored in many sites, including local, remote, and centralized databases. Additionally, data is frequently stored on different systems, by different methods, and in multiple formats.
For example, a typical corporate legal department has a large wealth of knowledge contained in stored data, such as documents, databases, and email, which can be leveraged to aid attorneys in preparing new work product. Further, an emphasis on cost-consciousness drives a desire for increased efficiencies in the amount of time spent on legal matters. The volume and dispersed nature of the data makes tracking, searching, and reutilization of such data difficult.
Currently, corporations use different data management tools to address their various needs. For instance, content management, electronic mail, accounting, and deadline tracking are handled by different solutions. Unfortunately, the need for multiple solutions leads to data segregated into many different information silos, each with their own storage formats. Locating and searching content in each silo can require unique user login requirements and individualized search methodologies that return standalone, segregated, and customized search results.
Conventional content management and search tools have proven inadequate for providing efficient detection of related documents. For example, BA-Insight LLC, a Delaware limited liability company, conducts post-processing of search query results of documents. Documents matching a user search query are first identified. The identified documents are then grouped based on shared metadata information, such as author or date, and returned to the user. However, documents that may be relevant to the user's query, but lack the search query terms, are not considered.
Thus, there remains a need for a system and method for increasing the efficiency of document search by identifying content similarity across documents.