Individuals and organizations often process, index, and/or annotate records for archiving. For example, organizations may become involved in requirements for e-discovery of various records and documents. To prepare for e-discovery, these organizations may process their records and documents using an e-discovery service. The e-discovery service may tag strings of text and other data within the records for easy search, recall, and/or other statistics and processing.
Organizations also often deduplicate archived records (e.g., compress them by removing redundancies) to conserve storage space, for example. However, deduplication operations may suffer from several inefficiencies. For example, conventional deduplication systems may attempt to deduplicate records that contain little or no redundant information. Conventional systems may also lack information about which records share redundant information (and obtaining that information may be costly in terms of time and performance). Such systems may also attempt to deduplicate records that have few commonalities before deduplicating records that have more commonalities, despite time limits that prevent the system from deduplicating all records (e.g., misallocated priorities). Moreover, conventional deduplications systems may move records to different deduplication locations without grouping the records to optimize the commonalities between the records at one or more of these locations. The instant disclosure identifies a need, therefore, for improved methods for deduplicating archive objects.