Various forms of network-based storage systems exist today. These forms include network attached storage (NAS), storage area networks (SAN's), and others. Network-based storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network-based storage system typically includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (clients). The data is stored and retrieved as storage objects, such as blocks and/or files. A block is a sequence of bytes or bits of data having a predetermined length. A file is a collection of related bytes or bits having an arbitrary length. In the context of NAS, a storage server operates on behalf of one or more clients to store and manage file-level access to data. In the context of NAS, a storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files. The files may be stored in a storage system that includes one or more arrays of mass storage devices, such as magnetic or optical disks or tapes, by using a data storage scheme such as Redundant Array of Inexpensive Disks (RAID). Additionally, the mass storage devices in each array may be organized into one or more separate RAID groups. In a SAN context, a storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain storage servers made by NetApp, Inc. (NetApp®) of Sunnyvale, Calif.
FIG. 1 is a prior art illustrative embodiment of a Write Anywhere File Layout (WAFL) file system. Referring to FIG. 1, WAFL aggregate 100 is an instance of the WAFL file system. WAFL aggregate 100 includes one or more flexible volumes 110, one or more volume containers 120, and physical storage 130.
WAFL aggregate 100 is a physical storage container that can store data in the WAFL file system. Flexible volume 110 is a logical volume that allows the virtualization of the allocation of volumes on physical storage 130. Thereby multiple, independently managed flexible volumes 110 can share the same physical storage (e.g., physical storage 130). The virtualization requires mapping between virtual volume block numbers (VVBNs) used by flexible volume 110 and physical volume block numbers (PVBNs) used by WAFL aggregate 100 to access data stored in physical storage 130. A PVBN, as used herein, refers disk blocks that have been abstracted into a single linear sequence in the aggregate. Each volume container 120 corresponds to a flexible volume 110. Volume container 120 contains all the data blocks for a corresponding flexible volume 110.
As used herein, a block offset or an offset refers to a distance in blocks from the beginning of a storage object such as a volume, file, extent, etc. Block addresses used within flexible volume 110 refer to block offsets within volume container 120. Since volume container 120 contains every block within flexible volume 110, there are two ways to refer to the location of a particular block. The PVBN specifies the location of a block within WAFL aggregate 100. The VVBN specifies the offset of the block within the container file. When a block in a file is requested, flexible volume 110 translates the file offset into a VVBN. The VVBN is passed from flexible volume 110 to volume container 120. Volume container 120 translates the VVBN to a PBVN. The PVBN is then used to access the requested block in physical storage 130. Once a VVBN has been translated into a PVBN, the block pointer for the PVBN in flexible volume 110 is updated to include (e.g., in a cache) the PVBN for the VVBN. Thereby, the next time the requested block is required, the flexible volume 110 can use the stored PVBN to access physical storage 130.
Current implementations of WAFL define a file as a tree of indirect blocks. Each indirect block in the tree has a fixed span: a fixed number of entries, each pointing to another block in the tree. Extents are represented using an entry for each block within the extent. An extent, as used herein, refers a contiguous group of one or more blocks. As a result, the amount of indirect block metadata is linear with respect to the size of the file. Additionally, disk gardening techniques, such as segment cleaning, file reallocation, etc., are complicated by caching PVBN pointers in VVBN blocks.
Storage systems often use a predetermined block size for all internal operations. For example, WAFL uses 4 KB (e.g., 4096 bytes) blocks for both VVBN and PVBN, as do client-side file systems for file block numbers (FBN). Block boundaries are expected to occur every 4 KB from an initial offset (e.g., FBN 0). Since file systems usually offset individual files based on these block boundaries, application writers take advantage of a file system's block size and alignment to increase the performance of their input/output (“I/O”) operations—for example, always performing I/O operations that are a multiple of 4 KB, and always aligning these operations to the beginning of a file. Other file systems or applications, such as a virtual machine, may use a block boundary of a different size (e.g., a virtual machine environment in which an initial master boot record block of 512 bytes is followed by the expected 4 KB blocks), resulting in misalignment between FBN's and PVBN's. Additionally, multiple virtual machines may share a single volume container 120 and each virtual machine may misaligned by a different amount.
Compression groups data blocks together to make a compression group. The data blocks in the compression group are compressed in a smaller number of physical data blocks than the number of logical data blocks. A typical compression group requires 8 (eight) logical data blocks to be grouped together such that compressed data can be stored in less than 8 physical data blocks. This mapping between physical data blocks and logical data blocks requires the compression groups to be written as a single data block. Therefore, the compression group is written to disk in full.
When a compression group is partially written by a user (e.g., one logical data block is modified in a compression group of 8 logical data blocks), all physical data blocks in the compression group are read, the physical data blocks in the compression group are uncompressed, and the modified data block is merged with the uncompressed data. If the system is using inline compression, then compression of modified compression groups is performed immediately prior to writing out data to a disk, and the compressed groups are all written out to disk. If a system is using background compression, then the compression of a modified compression group is performed in the background once the compression group has been modified, and the compressed data is written to disk. Random partial writes (partial writes to different compression groups) can therefore greatly affect performance of the storage system. In fact, write performance can be up to 15 times slower for compressed volumes than for uncompressed volumes. Therefore, although compression provides storage savings, the degradation of performance may be disadvantageous enough to not do compression in a storage system.