A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc., Sunnyvale, Calif.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing file-based and block-based protocol messages (in the form of packets) to the system over the network.
A plurality of storage systems may be interconnected to provide a storage system environment configured to service many clients. Each storage system may be configured to service one or more volumes, wherein each volume stores one or more data containers. Yet often a large number of data access requests issued by the clients may be directed to a small number of data containers serviced by a particular storage system of the environment. A solution to such a problem is to distribute the volumes serviced by the particular storage system among all of the storage systems of the environment. This, in turn, distributes the data access requests, along with the processing resources needed to service such requests, among all of the storage systems, thereby reducing the individual processing load on each storage system. However, a noted disadvantage arises when only a single data container, such as a file, is heavily accessed by clients of the storage system environment. As a result, the storage system attempting to service the requests directed to that file may exceed its processing resources and become overburdened, with a concomitant degradation of speed and performance.
One technique for overcoming the disadvantages of having a single file that is heavily utilized is to stripe the file across a plurality of volumes configured as a striped volume set (SVS), where each volume, such as a data volume (DV), is serviced by a different storage system, thereby distributing the load for the single file among a plurality of storage systems. A technique for data container (such as a file) striping is described in the above-referenced U.S. patent application Ser. No. 11/119,278, entitled STORAGE SYSTEM ARCHITECTURE FOR STRIPING DATA CONTAINER CONTENT ACROSS VOLUMES OF A CLUSTER. According to the data container striping arrangement, each storage system of may service access requests (i.e., file operations) from clients directed to the same file. File operations, such as read and write operations, are forwarded directly to the storage systems that are responsible for their portions of the data for that file.
In addition to the file data, there are meta-data, such as timestamps and length, associated with the file. A timestamp is a file attribute that provides an indication of the last time the file was modified, i.e., the modification time (mtime) for the file. The mtime is typically consulted on every operation directed to the file and, in the case of a write operation, is changed. For example, in response to a read operation issued by a client, the storage system returns the data and the current mtime on the file, whereas in response to a write operation, the storage system returns an incremented mtime. Effectively, every successive write operation is accorded a greater mtime than the one before it.
Many client protocols, such as the Network File System (NFS) protocol, allow use of client-side “caching” of data retrieved from a storage system. In response to a read operation issued by a client for a file, the storage system returns the requested data along with the current mtime of the file. The client stores the information in a cache memory so that future read operations directed to that file data may be serviced locally at the client (from the cache) instead of remotely over the network. For client-side caching to operate properly, there must be guarantees that the data subsequently retrieved from the cache is consistent with the actual file system and not “stale”, i.e., that the file data has not changed since it was cached at the client. To that end, the NFS protocol enables periodic “pinging” (polling) of the state of the file by the client through requests for the current mtime of the file from the storage system If the mtime has not increased since the data was cached, the client-side cache is maintained “fresh” and the client continues to use the cached data. If the mtime has changed, then the client discards its cached data and reissues a read operation to the storage system for file data.
Note that, as used herein, file operations are “serializable” if they can be replayed in a reported order and the result is identical to the actual file system. File operations are “causally connected” if they affect the same meta-data or the same region of the same file. Some client protocols (like NFSv2) require “strong serialization semantics”; that is, mtimes must always increase for operations that complete with increasing wall-clock time, even if they are not casually connected. “Weak serialization semantics”, on the other hand, only require that mtimes always increase for operation that complete with increasing wall-clock time if the operations are causally connected.
Certain file system protocols, such as the Common Internet File System (CIFS) protocol, support weak serialization semantics because of the nature of soft locks, such as opportunistic locks (op-locks). An op-lock is an automatically revocable soft lock that allows a client to operate on a range of file data until such time as a server (e.g., the storage system) instructs the client to stop. That is, the client can cache the data and perform read and write operations on the cached data until the storage system instructs it to return that data to the system. The client can cache the results of write operations since it knows that no other access is allowed to that same region of the file as long as it has an op-lock on the region. As soon as a second client attempts a conflicting operation on that region of the file, the storage system blocks the conflicting operation and revokes the op-lock. In particular, the storage system instructs the client to return (“flush”) any write modifications to the system and then discard the entire content of its client-side cache. Once that happens, the storage system unblocks the second client and grants it an op-lock to the conflicting region.
NFSv2 and NFSv3 protocols do not utilize op-locks and, thus, do not employ the above revocation system. For these protocols, the storage system must rely on strong serialization semantics. Other protocols, such as the NFSv4 protocol, use a type of soft lock called delegations that allows the storage system to use weak serialization semantics. Because CIFS and NFSv4 clients rely on such a “rough” protocol for guaranteeing consistency of cached data, they are not concerned with mtimes associated with read and write operations. This, in turn, enables the storage system to service such operation requests with weak serialization semantics.
In the data container striping arrangement described above, there is one volume, i.e., the container attribute volume (CAV), which is responsible for all the timestamps of a particular file stored on the SVS. As a result, for each file operation, the DV accesses the CAV to determine the mtime for the file. In response, the CAV updates the mtime on disk and returns the updated mtime to the DV which, in turn, returns the mtime and any associated data to the client. This arrangement places a substantial load on the storage system serving the CAV with a concomitant decrease in system performance. Moreover, depending on the load of the SVS, the meta-data requests to/from the CAV may become a bottleneck that adversely impacts performance of the system by, e.g., causing certain storage systems to stall (wait) until their meta-data requests have been processed before servicing client data access requests.