Various solutions have been provided to manage a body of stored content. In one approach, a database is used to store metadata associated with the stored objects comprising a body of stored content. The database is used to perform such tasks as identifying and retrieving specific stored objects of interest. Such content management solutions have been used, e.g., in connection with other applications, appliances, etc., to create and manage data archives for file system data, email messages, and other content.
One challenge faced when archiving a large body of stored content is the shear volume of data required to be stored. In some cases, and in particular for certain types of content, such as email, the same content or portion of content may appear many times in a body of content, and in a typical approach each instance is represented in the metadata database and/or archived separately. For example, a document may reside in a file system and then be sent as an email attachment to a first recipient, who may add other content and forward the attachment to a plurality of other recipients, etc. One or more recipients may save a copy of the attachment on their local system, rename it, and then forward the renamed copy on to yet other destinations. Still another email user might include the same content in the body of an email message or other object. For certain types of object that require a relatively large amount of storage space, such as images and other multimedia objects, storing numerous copies of the same content can be inefficient and costly.
Therefore, there is a need for a way to efficiently store a body of managed content in a way that avoids unnecessary duplication in the storage of at least certain content.