A storage system is a processing system adapted to store and retrieve data on behalf of one or more client processing systems (“clients”) in response to external input/output (I/O) requests received from clients. A storage system can provide clients with a file-level access to data stored in a set of mass storage devices, such as magnetic or optical storage disks or tapes. Alternatively, a storage system can provide clients with a block-level access to stored data, rather than file-level access or with both file-level access and block-level access.
Data storage space has one or more storage “volumes” consisting of physical storage disks, defining an overall logical arrangement of storage space. The disks within a volume are typically organized as one or more groups of Redundant Arrays of Independent (or Inexpensive) Disks (RAID). A volume may contain one or more file systems. A file system is an application layer that imposes a structure (e.g., hierarchical structure) on files, directories and/or other data containers stored and/or managed by a storage system. Application data sent to a storage system from a client system for storage may be divided up into fixed-size physical data blocks (for example, data blocks A, B, and C) stored on disks within a volume. To facilitate access to the data blocks, the storage system implements a file system that logically organizes information as a hierarchical structure of named directories and files on the disks. Some known file systems, including Write Anywhere File Layout (WAFL™) file system, provided by Network Appliance, Inc., of Sunnyvale, Calif., provide capability for creating snapshots of an active file system. An “active file system” is a file system to which data can be both written and read. The snapshot is a persistent point in time (PPT) image of the active file system that enables quick recovery of data after data has been corrupted, lost or altered. The PPT image and a “snapshot” shall be used interchangeably throughout this description. Snapshots can be created by copying the data at each predetermined point in time to form a consistent image, or virtually, by using a pointer to form the image of the data.
When pointers are used for snapshot creation, the created snapshot points to the data blocks in the active file system, such as data blocks A, B, and C. If one data block, e.g., data block C, is modified, a new data block (for example, data block C′) is allocated for new data and the new data block is written at a new location on a disk. Now the file system points to the new data block C′ as well as to the data blocks A and B. The file system terminates the link to the old data block C. Although data block C was modified, it is now being locked by the snapshot and cannot be de-allocated for new data until the snapshot is deleted. Thus, when blocks in the active file system are modified or removed, new blocks are added into the active file system. The old blocks, although removed from the active file system, are still being held by some snapshots and physically maintained on disk within the volume. This consumes space on the volume and causes the snapshot area to grow.
Each snapshot captures and saves data that has been changed in the active file system since the last snapshot. Thus, the size of a snapshot (e.g., in megabytes) depends on the rate of data changes in the active file system. The amount of data that has been changed in the active file system relative to a previous snapshot is known as the snap-delta function (“snap-delta”) because the increase (or decrease) in the next snapshot depends on the current state of changes in the active file system. Alternatively, the snap-delta may be defined as the difference in size between two snapshots.
Typically, a volume may include one or more logical unit numbers (LUN) to store user data, an initial snapshot reserve for saving one or more snapshots of the LUN, an available reserve space, and a snapshot overwrite reserve. Typically, the overwrite reserve is set to 100% of the total LUN size for snapshot overwrites. The remaining unused volume space (available volume space) is then available for snapshot data (and/or other data and files such as system files). A portion of the volume space equal to the size of the LUN is initially reserved for snapshot data to guarantee that at least one snapshot can be taken (e.g., if every data block in the LUN is changed).
When a first snapshot is created, the initial snapshot reserve is used. As additional snapshots are created, the available reserve space is used until all of the available reserve space is consumed by the snapshots so that only the snapshot overwrite reserve is available for subsequent snapshots. Because the overwrite reserve is allocated at 100% of the LUN size, there is still space on the volume if all data blocks are modified and a snapshot is created. A noted problem with this technique is that by maintaining overwrite reserve equal to the amount of space allocated for the application data, the amount of available space on the volume that can potentially be consumed by snapshots and other data is decreased.
To address this problem, only a fraction of the space allocated for the application data in the volume is reserved for snapshot overwrites. Fractional space reservation leaves more space on the volume for snapshot consumption and other data. A disadvantage of fractional space reservation is that configuring less than 100% of the application data space (LUN) for snapshot overwrite reserve space creates the possibility that at some point application data cannot be modified because there is not enough space on the volume for modifying the data. Thus, using fractional space reservation requires continuous monitoring of the available space on a volume.
According to one known technique, a component of a storage system monitors available space on a fractionally-reserved volume. A write operation from a client system executing a host application is rejected if there is not enough space on the volume to complete the operation. This technique is described in a commonly-assigned U.S. patent application Ser. No. 10/991,225, entitled “System And Method For Flexible Space Reservations In A File System Supporting Persistent Consistency Point Images,” by Himanshu Aggarwal and Eric Hamilton. Thus, according to this technique, the client system continues to issue I/O requests to the storage system even when there is not available space on the volume. These requests are rejected by the storage system. Rejecting client I/O requests may lead to undesirable consequences, such as loss of data. In addition, the client system halts execution of an application, which may result in performing additional steps by an application administrator, such as recovering application data and restoring the application.
What is needed is a mechanism that allows a client executing a host application to monitor the use of snapshot reserve space on a volume that stores application data and to detect conditions in the snapshot reserve space that indicate a risk of snapshot write failures before there is too little space on the volume to execute I/O requests.