Various types of storage servers are used in modern computing systems. One type of storage server is a file server. A file server is a storage server which operates on behalf of one or more clients to store and manage files in a set of mass storage devices, such as magnetic or optical storage based disks. The mass storage devices are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). One configuration in which file servers can be used is a network attached storage (NAS) configuration. In a NAS configuration, a storage server is implemented in the form of an appliance that attaches to a network. NAS systems generally utilize file-based access protocols; therefore, each client may request the services of the storage system by issuing file system protocol messages to the file system over the network.
Another configuration in which storage servers can be used is a Storage Area Network (SAN). A SAN is a high-speed network that enables establishment of direct connections between a storage system and its storage devices. In a SAN configuration, clients' requests are specified in terms of blocks, rather than files. Conceptually, a SAN may be viewed as an extension to a storage bus that enables access to stored information using block-based access protocols over the “extended bus.” In this context, the extended bus is typically embodied as Fiber Channel or Ethernet media adapted to operate with block access protocols. Thus, a SAN arrangement allows decoupling of storage from the storage system, such as an application server, and placing of that storage on a network.
Storage servers may also be utilized as secondary storage systems, such as the NearStore® line of products available from NetApp®, Inc. of Sunnyvale, Calif. Such secondary storage systems typically utilize magnetic disk drives in place of magnetic tape and/or optical media. A noted disadvantage of secondary storage systems is that the cost of magnetic disk drives is higher than that of tape and/or optical media. One technique to reduce the cost is to reduce the amount of data that is stored. This may be achieved, for example, by compressing the data prior to storing the data on disk, thereby reducing the total disk space required.
To date, storage systems have relied on compression techniques being applied at the application level (i.e., in the client) to reduce the amount of data that is stored. However, this approach requires special software to be built into the client applications. Other storage systems such as tape drives and disk controllers have used built-in hardware compression to achieve similar goals. However, incorporating a hardware based disk controller requires another layer of software to maintain a separate disk block mapping and is therefore undesirable for many reasons. For example, this technique binds a storage server to a single vendor for compression.
Other techniques involve data compression at the file system level by assembling together a group of logical data blocks and then compressing the group into lesser number of blocks, just before those blocks are stored to disk. However, these techniques assume that a predetermined data compression ratio is achieved, which does not necessarily hold true. For example, if the process employed assumes that every logical block overwrite in a group of physical blocks will take the entire compression group of physical blocks, then the process will overestimate the amount of space, and thereby lead to false failures based on an apparent lack of sufficient storage space. As another example, if the process employed assumes that every logical block overwrite in a group of physical blocks takes only one physical block, then when logical data blocks are flushed from memory to disk, there may not be enough space available, which may result in a data loss.