Enterprise content exists in many forms, such as text documents, spreadsheets, images, e-mail messages, and fixed content such as schematics, records, and scanned images. The need has arisen for an enterprise to treat this content as resource: managing and leveraging content as an asset, and reducing its risks as a liability, and reducing its cost of storage. Moreover, compliance regulations are making it necessary to have rapid and easy accessibility to content. Companies in financial services, regulated industries, and governmental agencies are faced with complying with new and existing government regulations, wherein the need to access and supply files and records is imperative to avoid fines, or forced closures. In the wake of recent high profile accounting scandals and the passage of the Sarbanes-Oxley Act, all publicly traded U.S. companies are required to manage and archive content.
With the need to be able to access documents at any time, and from any location, many enterprises are using content management systems which employ storage servers for storing and archiving content. These content management systems allow for much more flexibility than traditional localized storage. Relationships between content can be established, allowing the same content to be used in multiple contexts and renditions. It allows content to be published through multiple channels. For example, the same content can be easily faxed, or published to a web site. More importantly, the content can be accessed by users who are either away from the office, or in a regional office on the other side of the globe.
Typically the storage servers employed by content management systems store content on traditional file system disk drives, optical storage, tape drives, or SAN or NAS systems. These systems do not offer much protection for the stored content, however, as they physically store content by a traditional file name hierarchy. Employees or hackers who wish to destroy content can locate the content by the file path, and then simply delete it. This has led to the adoption of storing content on write once, read many, or WORM, devices, which is non-magnetic, non-erasable media. However, with WORM devices, if it is no longer necessary to store content, the only way to destroy the content is to literally break the optical platter that is typically used for WORM storage.
U.S. Pat. No. 6,807,632 ('632 patent) proposes a solution to some of the shortcomings of the prior art storage systems. The '632 patent describes a method for content addressable storage, storage that relies on a content address for describing the physical location of content instead of file paths. Content addressable storage takes a piece of content and saves it in a storage server, typically a node comprised of magnetic disks. When the content is saved, the storage server returns a claim check, a content address, that identifies not only where the content is stored, but also other properties of the content, called metadata. The content metadata is digital assets of the content, such as the name, date created, date last accessed, author, permissions, etc. The returned content address is a cryptographic hash value, generally a string of characters, that is generated from the metadata and other assets of the content. This is then put into an XML document which stores the content address as well as a locator for a descriptor file which holds the “keys” to deciphering the hash. By storing the metadata along with the content, it is easy to verify the content, and determine other properties of the content simply by accessing the metadata. Furthermore, by having the location where the content is stored as part of the metadata, the content will always be able to be located without the user or administrator having to track the physical location of stored content.
A drawback to the system of the '632 patent is that there is no method for managing the retention of the content or metadata. For most content, it is only necessary for them to be archived for a set amount of time, after which, the content is no longer needed.
It is accordingly a primary object of the invention to implement a method for storing content on a storage system, wherein an administrator can set certain properties, or metadata, of the content, which will be persisted with the content when it is stored. The metadata is also associated with the content and stored in a relational database, allowing retrieval of the content by means of the associated metadata. One of the properties that is settable by the user is a retention date. This retention date defines a point in time, after which the content and all associated metadata may be deleted from the storage system.
This is achieved by using a storage object and abstraction in the form of a plugin library which is configured to pass the user-defined metadata, including the retention period, and the content to the storage system. The storage object and plugin library are configured to interface with a particular type of storage system, such that when a content management server identifies a storage object associated with a particular storage system, it loads the appropriate plugin library for passing the content and metadata to the particular storage system. This allows for storage on a variety of storage systems, including traditional disk storage, databases, and content addressable storage.