A storage system typically comprises one or more storage devices into which information may be entered, and from which information may be obtained, as desired. The storage system includes a storage operating system that functionally organizes the system by, inter alia, invoking storage operations in support of a storage service implemented by the system. The storage system may be implemented in accordance with a variety of storage architectures including, but not limited to, a network-attached storage (NAS) environment, a storage area network (SAN) and a disk assembly directly attached to a client or host computer, i.e., direct attached storage (DAS). The storage devices are typically disk drives organized as a disk array, wherein the term “disk” commonly describes a self-contained rotating magnetic media storage device. The term disk in this context is synonymous with hard disk drive (HDD) or direct access storage device (DASD).
Storage of information on the disk array is preferably implemented as one or more storage “volumes” of physical disks, defining an overall logical arrangement of disk space. The disks within a volume are typically organized as one or more groups, wherein each group may be operated as a Redundant Array of Independent (or Inexpensive) Disks (RAID). Most RAID implementations enhance the reliability/integrity of data storage through the redundant writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate storing of redundant information (parity) with respect to the striped data. The physical disks of each RAID group may include disks configured to store striped data (i.e., data disks) and disks configured to store parity for the data (i.e., parity disks). The parity may thereafter be retrieved to enable recovery of data lost when a disk fails. The term “RAID” and its various implementations are well-known and disclosed in A Case for Redundant Arrays of Inexpensive Disks (RAID), by D. A. Patterson, G. A. Gibson and R. H. Katz, Proceedings of the International Conference on Management of Data (SIGMOD), June 1988.
The storage operating system of the storage system may implement a high-level module, such as a file system, to logically organize the information stored on the disks as a hierarchical structure of directories, files and blocks. For example, each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are organized within a volume block number (vbn) space that is maintained by the file system. The file system organizes the data blocks within the vbn space as a “logical volume”; each logical volume may be, although is not necessarily, associated with its own file system. The file system typically consists of a contiguous range of vbns from zero to n, for a file system of size n+1 blocks.
A known type of file system is a write-anywhere file system that does not over-write data on disks. If a data block is retrieved (read) from disk into a memory of the storage system and “dirtied” (i.e., updated or modified) with new data, the data block is thereafter stored (written) to a new location on disk to optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. An example of a write-anywhere file system that is configured to operate on a storage system is the Write Anywhere File Layout (WAFL®) file system available from Network Appliance, Inc., of Sunnyvale, Calif.
The storage system may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access the directories, files and blocks stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. Each client may request the services of the file system by issuing file system protocol messages (in the form of packets) to the storage system over the network. By supporting a plurality of file system protocols, such as the conventional Common Internet File System (CIFS) and the Network File System (NFS) protocols, the utility of the storage system is enhanced.
Typically, the amount of data managed by a storage system continually grows at prodigious rates. However, the number of people (e.g. storage administrators) managing storage generally does not grow at the same rate due to increased human resource cost. This results in additional workload for the storage administrators, especially in enterprise level storage installations. One noted disadvantage of many storage system environments is that conventional techniques for storage provisioning are inefficient both in human capital and in unused but allocated storage space. A typical provisioning process begins with a user estimating his storage needs and making a personal request to a storage administrator to create a logical unit number (LUN) of a certain size. While this description is written in terms of LUNs, the same procedure applies to requests for storage in NAS space, e.g., a NFS volume. Once the request has been approved by e.g., management, the storage administrator must find an appropriate array with sufficient space and within the zoning constraints of the overall storage system environment. After any particular zoning issues have been decided, the storage administrator then must choose a storage system within the constraints and create the appropriate LUN. This may require the storage administrator to first create a volume and then create a virtual disk on the volume to be exported as the LUN.
Once these decisions have been made, the LUN may be exported to a host, which may then mount the LUN for access. There is typically no follow up to ensure that the requested space is actually being utilized. A noted disadvantage of current storage provisioning techniques is that most storage is less than 35% utilized, which results in a subtotal industry loss, estimated at e.g., $20 billion per year. This wasted storage space is the result of users overestimating their actual storage needs and requesting extraneous space from the storage administrator.
Additionally, there exists no efficient technique for determining the actual rate of data growth within a storage system. Thus, storage administrators are forced to guess at storage requirements and when additional storage should be procured. When available storage space becomes low, the storage administrators must procure additional storage to meet user demands, even though there may be significant amounts of wasted storage available within the storage system environment as a whole.