A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc., Sunnyvale, Calif.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing file-based and block-based protocol messages (in the form of packets) to the system over the network.
A plurality of storage systems may be interconnected to provide a storage system environment configured to service many clients. Each storage system may be configured to service one or more volumes, wherein each volume stores one or more data containers. Yet often a large number of data access requests issued by the clients may be directed to a small number of data containers serviced by a particular storage system of the environment. A solution to such a problem is to distribute the volumes serviced by the particular to storage system among all of the storage systems of the environment. This, in turn, distributes the data access requests, along with the processing resources needed to service such requests, among all of the storage systems, thereby reducing the individual processing load on each storage system. However, a noted disadvantage arises when only a single data container, such as a file, is heavily accessed by clients of the storage system environment. As a result, the storage system attempting to service the requests directed to that, data container may exceed its processing resources and become overburdened, with a concomitant degradation of speed and performance.
One technique for overcoming the disadvantages of having a single data container that is heavily utilized is to stripe the data container across a plurality of volumes configured as a striped volume set (SVS), where each volume is serviced by a different storage system, thereby distributing the load for the single data container among a plurality of storage systems. A technique for data container striping is described in the above-incorporated U.S. patent application Ser. No. 11/119,278 of Kazar et al., entitled STORAGE SYSTEM ARCHITECTURE FOR STRIPING DATA CONTAINER CONTENT ACROSS VOLUMES OF A CLUSTER. In such an environment, a SVS comprises one or more data volumes (DV) and a meta-data volume (MDV). Each DV and the MDV is typically served by a separate node of the distributed storage system environment. In the environment described in the above-incorporated U.S. Patent Application, the node may comprise a network element (N-module) and a disk element (D-module) that includes a file system. As used herein a D-module serving (hosting) a DV is referred to as a “DV node,” while a D-module hosting the MDV for a SVS is referred to as a “MDV node.”
Most file access protocols include locking capabilities. A lock is a mechanism that enables a client or system administrator to prevent access to a shared resource. An entity later attempting to access this shared resource will be notified of the lock, which may prevent others from accessing the resource. The types of locks can vary. For example, in some instances, there may be a write lock placed on a resource by the owner, yet other entities may be able to simultaneously obtain read access to the resource. The type of lock, and/or the absence of a lock over a particular data container, such as a file, or to portion thereof are referred to herein as a “lock state.”
Various challenges arise with respect to managing lock state information regarding a distributed storage system. The volume of lock state information can be potentially large and subject to constant change as information is updated and edited.
Known techniques for managing lock state information include an approach in which lock state information is distributed directly to end clients. In this case, clients and/or end users utilize specific lock state management software applications and special protocols that allow the users to create, edit and manage lock state information.
Another approach stores lock state information in a central repository for the entire system; however, this centralized approach can result in a bottleneck for file access in a large, distributed system.
The approach described in the previously incorporated parent application Ser. No. 11/264,831 discloses a method and system in which a lock state manager configures a MDV as the authoritative source for lock state information for data containers on the SVS. Client requests for access to a particular data container or a portion of a container are directed to the MDV node, which searches its lock state database and returns the resulting lock state information to respective DV nodes associated with the data containers that store the requested data. The lock state information for each data request is returned by the MDV node to the DV node for storage in a local lock cache on the DV node.
Many DV nodes, however, have limited memory resources, thus limited space. In such limited memory situations, the DV node does not have adequate memory space capacity to store all lock state information provided to it by the MDV node. Secondly, the DV node may be unable to render a decision about whether to process a read or write request, e.g., directed to a file, because it may not have all of the respective lock state information for that file. In other words, if the DV node has consumed all of its memory capacity prior to receiving all of the lock state information, then it cannot retain all of the information locally and thus, may be unable to make a processing decision. Moreover, the DV node may not even be capable of returning a response to the MDV node, which to may cause the system to hang up or other similar error condition.
The parent application further discloses the use of permissive areas, which are similar to locks that are pre-assigned to designate one or more areas in a file that contain no locks. Permissive area information is sent to a DV node upon a request to the MDV node for lock state information about a file range that includes a respective permissive area. However, the request for permissive area information also requires additional memory capacity on the part of the DV nodes in order to locally store and maintain the information at the node.
Thus, there remains a need for a system that provides lock state information between an MDV node and its associated DV nodes, which requires only limited memory resource space, i.e., a small memory footprint, particularly in a local lock cache of the node. In addition, there remains a need for a technique for controlling lock state information while maintaining the ability to rapidly update this information on the distributed storage system without requiring the use of specialized software programs, and without requiring large memory resource capacity on each DV node.