A data storage system typically includes one or more storage devices into which information may be entered, and from which information may be obtained. The storage system may include a storage operating system that functionally organizes the storage system by invoking storage operations in support of a storage service implemented by the storage system. The storage system may be implemented with a variety of storage architectures including, but not limited to, a network-attached storage (NAS) environment, a storage area network (SAN), a direct-attached storage environment, and any combination thereof. The storage devices are typically disk drives organized as a storage array, although other storage devices (e.g., flash memory) may also constitute the array.
The storage operating system may implement a high-level abstraction layer to logically organize information as a hierarchical structure of storage objects, such as file systems, volumes, directories, and files. For example, each “on-disk” file may be implemented as set of data structures, e.g., blocks, configured to store information, such as the actual data for the file. These blocks may be organized within a volume implementing a volume block number (vbn) space that is maintained by the file system, whereby each volume may be, although is not necessarily, associated with its own file system. In certain cases, one or more volumes may additionally be organized to form a higher-level storage object, such as an aggregate, of the storage system.
The storage system may further be configured to operate according to a client/server model of information delivery to allow many clients access to information stored on the storage system. In this model, a client may constitute an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the storage system by issuing access requests (a read or write request) as an object-based (e.g., file- or block-based) protocol message to the storage system over the network.
Multiple storage systems may be interconnected to provide a clustered storage system (cluster) configured to service access requests using the combined resources of the cluster, where each storage system may be a “node” of the cluster. In some cases, the cluster may implement aggregates that may be distributed across the nodes of the cluster. Such aggregates may thus be configured to include one or more volumes, which may be served by the cluster in response to client requests for information organized within an aggregate.
Each node may constitute functional components that cooperate to provide a distributed architecture for the cluster. Such components may include a network element (N-blade or N-module), a storage element (D-blade or D-module), and a management element (M-host). The N-module may enable the node to connect to clients over the network, while the D-module may enable the node to connect to storage devices for storing data to and retrieving data from storage objects. In contrast, the M-host may generate information sharing operations to present a distributed file system image for the cluster.
Generally, the cluster may provide access to the totality of storage provided by the nodes (“cluster storage”) when clients connect to a node and submit an access request targeted at a storage object on the cluster storage. An N-module may be configured to receive the request and forward the request to a target D-module in the cluster which manages the requested storage object. The D-module may be targeted, for example, via the M-host functionality which manages a mapping between storage objects and D-modules in the cluster managing such respective storage objects. In the case of a read request, the target D-module may forward retrieved data to the N-module, which may in turn forward the retrieved data to the client in response to the read request.
One technique for improved servicing of requests by a node involves accelerating access to remote data (e.g., a volume) by caching the volume at the node receiving the request. A cached volume may accelerate access by avoiding the need for the node to retrieve such data from a remote node. In one example, the node may periodically request a pre-defined amount of data from the remote node prior to a client request for such data. In other examples, known techniques may be implemented to determine the amount and type of data to cache based on an access request history for the volume. By implementing volume caching techniques in a cluster, processing overhead at the nodes may be reduced while also conserving cluster bandwidth.
Challenges may arise, however, when seeking to optimize overall cluster performance using known volume caching techniques. For instance, known techniques may implement cached volumes at all nodes in the cluster to provide acceleration at all client access points of the cluster. However, these techniques fail to account for node-specific features (e.g., workload limits, storage space limits) to allow for the conservation of processing resources at select nodes.
To account for node-specific features, a user (administrator) of the cluster must possess detailed knowledge of the cluster layout (topology), including the specific features for each node. Based on this knowledge, the user may determine the nodes on which to implement volume caching and manually implement cached volumes on such determined nodes. As storage demands grow and the number of nodes in the cluster increases to meet those demands, there is a need for an improved method for caching volumes to optimize cluster performance.