1. Field of the Invention
This invention relates to computer systems in general, and more particularly to data storage.
2. Description of the Related Art
Modern storage environments may include many interconnected storage objects. The interconnection network is a physical-layer technology that provides the infrastructure to connect the various elements of a shared storage environment. Within the shared storage environment, file system abstractions may be built on top of volumes that may themselves be distributed across multiple storage devices. As the number of volumes and file system abstractions grows, the complexity of the entire storage environment grows dramatically.
To limit bottlenecking and improve data throughput, some storage management and access functions and operations traditionally performed by the computer systems may be moved out to other systems or into the storage itself. Object-based storage devices (OBSDs) are one example of the type of storage devices that may allow some or all low-level storage allocation to be relocated from their traditional placement in the servers and into the OBSDs themselves. Another example of a storage system that off-loads storage management from the server is a Network Attached Storage (NAS), which is a specialized file server computer appliance, sometimes referred to as a “filer”.
Yet another approach is taken by distributed shared storage environments, which may separate the actual storage of data from the management of that data. Storage architectures that employ this technique may be referred to as out-of-band or asymmetric systems. A metadata server (MDS) may provide higher-level data management and control functions including, among others, file system mapping, data mirror synchronization, client authentication and access privileges. The data itself is generally stored on various storage devices attached to the network. Without the need to worry about providing file system abstractions, or other metadata, storage devices may focus on providing only data storage and retrieval. Object-based storage devices (OBSDs) are one example of the type of storage devices that may be employed in out-of-band or asymmetric systems.
When utilizing distributed, or asymmetric storage, client nodes may initially contact the MDS to request access to a specific dataset. The MDS, after authenticating the client node and applying whatever access policies are in place, may generally provide the requesting client node with information about where that particular dataset is stored (metadata), and an access token to present to the storage device. Client nodes may then communicate directly with storage devices, presenting the access token for reading and writing of data. The access token tells the storage device what data the client node is allowed to access, and also whether that client is allowed read/write access, or merely read-only access.
Separating data from its associated metadata can allow the actual data traffic to be routed directly to the storage device, thus preventing the MDS from becoming a bottleneck and limiting data throughput. This also may allow the MDS to be optimized for metadata lookups which usually involves smaller reads and writes, while allowing the storage devices to be optimized for bulk data transfer of block reads and writes.
One type of storage device that may be used in shared storage environments is the Network Attached Storage (NAS). NAS provides clients access to file objects, comprising a logical collection of bytes on the storage devices, with certain metadata that stores file's attributes. Files represent a storage abstraction that can represent application specific structures such as documents, database tables, images or other media. Meta file systems built using NAS filers may use multiple files distributed among multiple NAS filers to store their files.
Another type of storage device that may be used in shared storage environments is the object-based storage device (OBSD). OBSDs provide clients with access to variable size data objects, comprising a logical collection of bytes on the storage device, similar to files provided by NAS filers, but with interfaces and characteristics that are more like those of traditional disk devices. File systems built around OBSDs may use multiple objects per data file.
File systems may need to store multiple versions of the same file, preserving old file versions for back up, regulatory compliance or multiple clients parallel access to the data. For example, in distributed file systems where multiple client nodes may simultaneously access the same data, files may be fixed into specific versions to ensure data integrity among client sessions. In other deployments, a specific data image of a file as of specific point in time may be preserved for backup, while allowing applications to access and modify a more current version of the file. In yet other deployments the file system may preserve all versions of a data file over its lifetime in order to meet certain legal requirements for data tracking and preservation. These dataset versions are typically referred to as data images or snapshots. Snapshots may include one or more logical data layers that comprise a specific version of the data image. Snapshots may be stored on sparse data objects or file objects, where each object may store changes or additions compared to the data in other objects of the snapshot. Thus, any individual sparse data or file object may contain holes in its data representation. Storage devices may be configured to expose the presence of these holes. Locating a specific piece of data from within a sparse data image generally involves attempting to read the data from each sparse data or file object iteratively, usually starting with the most current version, until the correct version of the data is found.