Various forms of network storage systems are known today. These forms include network attached storage (NAS), storage area networks (SANs), and others. Network storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network storage system includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (“clients”). In the context of NAS, a storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical disks or tapes. The mass storage devices may be organized into one or more volumes of a Redundant Array of Independent Disks (RAID). Enterprise-level filers are made by Network Appliance, Inc. of Sunnyvale, Calif. (NetApp®). In a SAN context, the storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain Filers made by NetApp.
One of the primary jobs of a storage system administrator is to monitor how the space is used in the storage system, predict when various storage pools will be exhausted, and react to situations where some operations failed due to lack of storage space. In the days of simple disk drives and file systems, this task was easy. Modern filers, however, are much more complicated, especially when they are used for storing Logical Unit Numbers (LUNs).
A filer may have a number of aggregates. An “aggregate” is a logical container for a pool of storage, combining one or more physical mass storage devices (e.g., disks) or parts thereof into a single logical storage object, which contains or provides storage for one or more other logical data sets at a higher level of abstraction (e.g., volumes). A “volume” is a set of stored data associated with a collection of mass storage devices, such as disks, which obtains its storage from (i.e., is contained within) an aggregate, and which is managed as an independent administrative unit, such as a complete file system. A “file system” is an independently managed, self-contained, hierarchal set of data units (e.g., files, blocks or LUNs). A file system may be a volume, for example. Although a volume or file system (as those terms are used herein) may store data in the form of files, that is not necessarily the case. That is, a volume or file system may store data in the form of other units, such as blocks or LUNs.
A traditional volume has a fixed, one-to-one relationship with its containing aggregate (i.e., exactly coextensive with one aggregate). Consequently, there is a fixed relationship between each traditional volume and the disks that are associated with it. This fixed relationship means that each volume has exclusive control over the disks that are associated with the volume. Only the volume associated with the disk can read and/or write to the disk. Unused space within the disks associated with the volume cannot be used by another volume. Thus, even if a volume is only using a fraction of the space on its associated disks, the unused space is reserved for the exclusive use of the volume. Thus, a traditional volume is a space-guaranteed volume in that every byte of the volume is already physically allocated from the underlying aggregate. In this configuration, the system administrator would only need to see how much space is available in the volume. If there is free space, there is little risk of a write failure. If there is too little free space, the storage administrator may need to delete some files to recover space. Storage administrators tend to reserve more space than actually needed to avoid ever running out space. As it turns out frequently, much of the reserved space is wasted.
To improve space utilization, a flexible volume may be used. A flexible volume is analogous to a traditional volume, in that it is managed as a file system; but unlike a traditional volume, a flexible volume is treated separately from the underlying physical storage that contains the associated data. A “flexible volume” is, therefore, a set of stored data associated with one or more mass storage devices, such as disks, which obtains its storage from an aggregate, and which is managed as an independent administrative unit, such as a single file system, but which is flexibly associated with the underlying physical storage. Flexible volumes allow the boundaries between aggregates and volumes to be flexible, such that there does not have to be a one-to-one relationship between a flexible volume and an aggregate. An aggregate can contain multiple flexible volumes. Hence, flexible volumes can be very flexibly associated with the underlying physical storage block characteristics. Further, to help reduce the amount of wasted storage space, any free data block in an aggregate can be used by any flexible volume in the aggregate. A flexible volume can be grown or shrunk in size. Furthermore, blocks can be committed to flexible volumes on-the-fly from available storage. A flexible volume may be a non-space-guaranteed volume, which means that not every byte of the volume is physically allocated from the underlying aggregate(s). A flexible volume may be created with its size larger than the physical size of the underlying aggregate(s). This situation is called aggregate overcommitment. Aggregate overcommitment provides the type of flexibility that is particularly useful to a storage provider. Using aggregate overcommitment, it may appear that more storage than is actually available from a given aggregate is provided. This arrangement may be useful if a system administrator is asked to provide greater amount of storage than he knows will be used immediately. Alternatively, if there are several volumes that sometimes need to grow temporarily, the volumes can share the available space with each other dynamically.
Many storage servers also have the ability to generate a read-only, persistent point-in-time image (PPI) of data set, such as a volume, file, or logical unit number (LUN). A PPI captures the exact state of data in a data set at the point in time that the PPI was taken. This allows the state of the data set to be restored from the PPI in the event of, for example, a catastrophic failure of the storage system or corruption of data.
An example of a PPI is a Snapshot™ such as may be created using SnapManager® from NetApp. The term “Snapshot” is used herein without derogation of the trademark rights of Network Appliance, Inc. NetApp's Snapshot mechanism is implemented, at least in part, in its DATA ONTAP® operating system, which implements a write out-of-place file system. The write out-of-place file system, known as WAFL®, writes all modified data to new locations on disk, instead of overwriting the old data. Instead of duplicating disk blocks that are the same in a PPI as in the active file system, a NetApp Snapshot shares these data blocks with the active file system. When blocks in the active file system are modified or removed, new blocks are added into the active file system to replace the old ones because of the file system's write out-of-place property. The old blocks, although removed from the active file system, are still being held by some Snapshots, thus, causing the Snapshots area to consume disk space. This consumes free space from the volume and causes the Snapshot area to grow. A storage administrator may periodically release some obsolete Snapshots to return free spaces back to the file system. However, the storage administrator needs to closely monitor the file system and Snapshots.
Although modern storage servers provide more flexible ways to manage a file system, such flexible ways require a storage administrator's close and careful monitoring of the storage server, which requires constant attention and intensive manual operations and calculations under the current storage management scheme. Traditionally, the operating system of a storage server provides commands for system administrators to monitor and manage the server. For example, Unix and DOS operating systems provide command lines that allow an administrator to list the content of a directory, size of a file, free spaces available to a volume, etc. Modern operating systems usually provide GUI tools to make the monitoring and managing of a complicated storage server easier. An example of a GUI based storage server monitoring and management system is the DataFabric® Manager (DFM) of Network Appliance, Inc. However, these GUI based storage server managers do not provide systematic and automatic tools for advanced space monitoring and management.