Some storage systems receive and process access requests that identify a data unit or other content unit (also referenced to as an object) using an object identifier, rather than an address that specifies where the data unit is physically or logically stored in the storage system. Such storage systems are referred to as object addressable storage (OAS) systems. In object addressable storage, a content unit may be identified (e.g., by host computers requesting access to the content unit) using its object identifier and the object identifier may be independent of both the physical and logical location(s) at which the content unit is stored (although it is not required to be because in some embodiments the storage system may use the object identifier to inform where a content unit is stored in a storage system). From the perspective of the host computer (or user) accessing a content unit on an OAS system, the object identifier does not control where the content unit is logically (or physically) stored. Thus, in an OAS system, if the physical or logical location at which the unit of content is stored changes, the identifier by which host computer(s) access the unit of content may remain the same. In contrast, in a block I/O storage system, if the location at which the unit of content is stored changes in a manner that impacts the logical volume and block address used to access it, any host computer accessing the unit of content must be made aware of the location change and then use the new location of the unit of content for future accesses.
One example of an OAS system is a content addressable storage (CAS) system. In a CAS system, the object identifiers that identify content units are content addresses. A content address is an identifier that is computed, at least in part, from at least a portion of the content (which can be data and/or metadata) of its corresponding unit of content. For example, a content address for a unit of content may be computed by hashing the unit of content and using the resulting hash value as the content address. Storage systems that identify content by a content address are referred to as content addressable storage (CAS) systems.
The eXtensible Access Method (XAM) proposal is a proposed standard, that employs content addressable storage techniques, that is being developed jointly by members of the storage industry and provides a specification for storing and accessing content and metadata associated with the content. In accordance with XAM, an “XSet” is a logical object that can be defined to include one or more pieces of content and metadata associated with the content, and the XSet can be accessed using a single object identifier (referred to as an XUID). As used herein, a logical object refers to any logical construct or logical unit of storage, and is not limited to a software object in the context of object-oriented systems.
As discussed above, an XSet can store one or more pieces of content. For example, an XSet can be created to store a photograph and the photograph itself can be provided as a first “stream” to the XSet. One or more files (e.g., text files) can be created to include metadata relating to the photograph, and the metadata file(s) can be provided to the XSet as one or more additional streams. Once the XSet has been created, a XUID is created for it so that the content (e.g., the photograph) and its associated metadata can thereafter be accessed using the single object identifier (e.g., its XUID). A diagram of an illustrative XSet 100 is shown in FIG. 1. As shown in FIG. 1, XSet 100 includes a number of streams for storing user provided content and metadata. The XSet may also include a number of additional fields 103 that store other types of metadata for the XSet, such as, for example, the creation time for the XSet, the last access time of access of the XSet, and/or any retention period for the XSet.
In XAM, each field or stream in an XSet may be designated as binding or non-binding. Binding fields and streams are used in computing the XUID for the XSet, while non-binding fields and streams are not. That is, the XUID for an XSet is computed based on the content of the binding fields and streams (e.g., by hashing the content of these fields and streams), but not based on the non-binding fields and streams. The designation of certain fields and/or stream as binding may change. Re-designating as binding a field or stream that had been previously designated as non-binding causes the XUID for the XSet to change. Similarly, re-designating a field or stream as non-binding that had previously been designated as binding causes the XUID for the XSet to change.
Because the XUID for an XSet is generated using the content of the binding fields and streams, the binding fields and streams of the XSet cannot be changed once the field becomes binding (though these fields and streams can be re-designated as non-binding and then changed). A request to modify a binding field or stream will result in a new XSet with a different XUID being created.
Some storage systems have “tiers” implemented by storage devices with different storage characteristics. One tier may provide fast access to data and may serve as a transactional storage tier. Such a tier, for example, may be implemented with memory in a server that accesses the data or in a network attached storage (NAS) device. Another tier may be implemented with fixed content storage. Such a tier, for example, may be implemented with a tape or other bulk storage that can store large amounts of data inexpensively, but requires more time to access. A content addressable storage (CAS) system also may be used to implement a fixed content storage tier.