A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage environment, a storage area network and a disk assembly directly attached to a client or host computer. The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on volumes as a hierarchical structure of data containers, such as files and logical units (LUs). For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn). The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system.
A known type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from NetApp, Inc. Sunnyvale, Calif.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access data containers stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing access requests (read/write requests) as file-based and block-based protocol messages (in the form of packets) to the system over the network.
It is advantageous for the services and data provided by a storage system, such as a storage node, to be available for access to the greatest degree possible. Accordingly, some storage systems provide storage system nodes interconnected as a cluster, with a first storage system node being clustered with a second storage system node to provide high availability of data access. Each node of the cluster may include (i) a storage server (referred to as a “D-module”) adapted to service particular aggregate(s) or volume(s) and (ii) a multi-protocol engine (referred to as an “N-module”) adapted to redirect the data access requests to any storage server of the cluster. In the illustrative embodiment, the storage server of each node is embodied as a disk element (D-module) and the multi-protocol engine is embodied as a network element (N-module). The N-module receives a multi-protocol data access request from a client, converts that access request into a cluster fabric (CF) message and redirects the message to an appropriate D-module of the cluster.
The nodes of the cluster may be configured to communicate with one another to act collectively to increase performance or to offset any single node failure within the cluster. Each node in the cluster may have a predetermined failover “partner” node. When a node failure occurs (where the failed node is no longer capable of processing access requests for clients), the partner node of the failed node may “takeover” the data services of the failed node. In doing so, access requests sent to the failed node may be re-directed to the partner node for processing. As such, the cluster may be configured such that a partner node may take over the work load of a failed node. A node may be referred to as a local/primary node when referring to a current node being discussed, whereas a remote/partner node refers to a predetermined failover partner node of the local/primary node. As used herein, various components residing on the primary node may likewise be referred to as a local/primary component (e.g., local memory, local write-log layer, etc.) and various components residing on a remote node may likewise be referred to as a remote component (e.g., remote memory, remote write-log layer, etc.).
The shared storage may comprise a plurality of aggregates, where each aggregate may be configured to contain one or more volumes. The volumes may be configured to store content of data containers, such as files and logical units, served by the cluster in response to multi-protocol data access requests issued by clients. Each node of a cluster may “own” an assigned predetermined set of aggregates (aggregate set) within the shared storage, whereby only the assigned node is configured to service data for the predetermined aggregate set during normal operating conditions (when no node has failed). However, upon failure of a node, “ownership” of the entire aggregate set of the failed node may be transferred to the partner node (so that servicing of data for the entire aggregate set of the failed node may be taken over by the partner node). As such, a cluster may be configured such that a partner node may takeover the work load of a failed primary node where the partner node assumes the tasks of processing and handling any data access requests normally processed by the failed primary node.
Each node of a cluster provides data-access service to clients by providing access to shared storage (comprising a set of storage devices). Typically, clients will connect with a node of the cluster for data-access sessions with the node. During a data-access session with a node, a client may submit access requests (read/write requests) that are received and performed by the node. For the received write requests, the node may produce write logs that represent the write requests and locally store the write logs to a local memory device (from which, the node may at a later time perform the write logs on the storage devices). To ensure data consistency, the write logs of a primary node may also be periodically sent/transmitted to the partner nodes in the cluster for remote storage at the partner nodes. As such, if the local/primary node fails, a remote/partner node will have a copy of the write logs and will still be able to perform the write logs on the shared storage.
However, as cluster storage systems become larger and contain more nodes, whereby each primary node in the cluster has more partner nodes, the transmission and remote storage of write logs may consume valuable storage space and I/O resources of the nodes in the cluster. For example, if a cluster has four nodes (whereby each primary node has three failover partner nodes), each node may locally store its own write logs and the write logs of each of the three partner nodes. As such, as the number of nodes in the cluster increases, the amount of storage space at each node required to store the write logs of the partner nodes increases as well. Also, since write logs are periodically sent to each partner node, the number of write log data exchanges (sending and receiving) between the nodes of a cluster increases with each node added to the cluster. This may consume significant I/O resources of the nodes in the cluster. As such, an improved method for managing write logs of a cluster storage system is needed.