Data storage systems are configured for storing, managing and accessing large amounts of data. In a data archive system, particularly, the data may be stored for long periods of time. The data can include documents which have been selected for permanent or long-term digital preservation. The duration of the data may be defined according to data's enduring value or regulatory requirements.
A core artifact maintained in an archive system is an archive object (AO). An AO includes a raw content data object (CDO) that is archived along with metadata used for describing that CDO. Some metadata are used to ensure that the CDO is uniquely identifiable, to describe its provenance and context, and to determine if the data has been altered in an undocumented manner. Some of the metadata resides in storage models or structures that are suitable for fast access (e.g., a database or extended attributes of a file system).
Often associations and relationships exist among various AOs. As an example, in an email, there are threads that include attachments. If each thread and each attachment is stored as a separate AO, the AOs that originated from the same email are associated. In general, the associated (or related) AOs do not necessarily reside on the same storage media or storage container, and this has disadvantages. For example, if related data lacks self-containment (i.e., if the related data is not stored in the same storage container), the access time is not optimized because the data will have to be read by way of several requests submitted to different storage media or devices. If, for some reason, one or more of such media is off-line, then the access delay is even further exacerbated.