A storage controller is a physical processing device that is used to store and retrieve data on behalf of one or more hosts. A network storage controller can be configured (e.g., by “hardwiring”, software, firmware, or any combination thereof) to operate as a storage server that serves one or more clients on a network, to store and manage data in a set of mass storage devices, such as magnetic or optical storage-based disks, tapes, or flash memory. Some storage servers are designed to service file-level requests from hosts, as is commonly the case with file servers used in a network attached storage (NAS) environment. Other storage servers are designed to service block-level requests from hosts, as with storage servers used in a storage area network (SAN) environment. Still other storage servers are capable of servicing both file-level requests and block-level requests, as is the case with certain storage servers made by NetApp®, Inc. of Sunnyvale, Calif., employing the Data ONTAP® storage operating system.
As storage systems become larger, in order to accommodate the need for more capacity, several problems arise. Particularly, the efficient use of storage space becomes more difficult. One problem in particular is duplicate data. A typical data volume can contain thousands or even millions of duplicate data objects. As data is created, distributed, backed up, and archived, many duplicate data objects are commonly created as an incidental result of these operations. The end result is inefficient utilization of data storage resources. Deduplication operations identify and eliminate the undesired duplicate data objects. Commonly, this is done by deleting all but one copy of a given data object and replacing all duplicates of that data object with a reference to the singe remaining data object. Compression operations reduce the amount of physical storage space used to store a particular data segment. Storage efficiency operations, such as deduplication and compression, provide a benefit in storage space efficiency. The result can be reduced operation cost due to longer intervals between storage capacity upgrades and more efficient management of stored data.
A network storage system can have a simple architecture; for example, an individual storage server can provide one or more clients on a network with access to data stored in a mass storage subsystem. Recently, however, with storage capacity demands increasing rapidly in almost every business sector, there has been a trend towards the use of clustered network storage systems, to improve scalability.
In a clustered storage system, two or more storage server “nodes” are connected in a distributed architecture. The nodes are generally implemented by two or more storage controllers. Each storage server “node” is in fact a storage server, although it is implemented with a distributed architecture. For example, a storage server node can be designed to include a network module (“N-module”) to provide network connectivity and a separate data module (e.g., “D-module”) to provide data storage and data access functionality, where the N-module and D-module communicate with each other over some type of physical interconnect. Two or more such storage server nodes are typically connected to form a storage “cluster”, where each of the N-modules in the cluster can communicate with each of the D-modules in the cluster.
A clustered architecture allows convenient scaling through the addition of more N-modules and D-modules, all capable of communicating with each other. Further, a storage cluster may present a single system image of stored data to clients and administrators, such that the actual location of data can be made transparent to clients and administrators. An example of a storage controller that is designed for use in a clustered system such as this is a storage controller employing NetApp's Data ONTAP® GX storage operating system.
Efficient use of storage space is also a concern in a clustered storage system, and in fact, the problem can even be magnified due to the distributed architecture of the clustered storage system. A large cluster can have dozens or even hundreds of nodes, containing tens of thousands of volumes. Because of the distributed architecture, the storage that a client accesses may not all be controlled by the same D-module. Further, a single D-module can control storage accessed by multiple clients and managed by administrators in multiple locations. Storage efficiency operations, e.g., deduplication and compression, can be performed by the D-module to improve the way storage space is used. An administrator may request storage efficiency operations to be performed by a number of D-modules which are responsible for maintaining the storage devices associated with a client.
Configuring storage efficiency operations for a volume (an abstraction of physical storage devices) typically involves manually assigning a large number of attributes to the volume. For example, a deduplication option (whether data on the volume should be deduplicated), a compression option (whether data on the volume should be compressed), a compression and/or deduplication schedule, a duration of the compression and/or deduplication operation, an operation type (background vs. foreground), etc. can be set for each volume. These attributes can be set for a particular volume depending on various factors, such as the expected type of workload, performance requirements, characteristics of the data set, availability of CPU power, backup schedules, etc.
With conventional technology, all of these attributes are assigned individually for each volume in the storage system. This is true even in cases where the same configuration is valid for multiple volumes. Thus, scalability challenges arise in administering storage efficiency operations in a clustered network storage environment. A large cluster can include tens of thousands of volumes. Individually configuring storage efficiency attributes of storage efficiency operations for such a large number of volumes can be very time-consuming and burdensome.