Many entities (e.g., businesses, research labs, governmental agencies, etc.) are required by law to retain data (e.g., electronic documents, digital images, audio/video files, etc.) for a certain period of time. In some cases, an entity may be required to destroy data to adhere to privacy regulations. To comply with various laws and regulations, it is becoming increasingly important particularly for businesses to establish some kind of a document retention policy over their data storage systems or repositories. Such a document retention policy generally specifies how business documents should be managed and/or destroyed. For instance, a document retention policy may specify a period of time to retain a particular document or it may require an administrative review by a user before a document can be destroyed. The ability for an entity to understand and structure the life cycle of its documents as they are created, maintained, and ultimately destroyed can greatly impact the infrastructure and processes required to house its content.
Business documents are often held in various forms (e.g., text files, graphic files, emails, etc.), which adds to the complexity of controlling and managing them. Moreover, each of these business documents may have duplicates (i.e., identical copies) maintained or held in different locations (e.g., different directories on the same hard drive, different hard drives, different servers, etc.) for various reasons. Such an inefficient use of space (i.e., document duplication) can contribute to the complexity in resolving problems associated with business document retention. Embodiments of the present invention provide a solution to address this problem and more.