Various forms of network-based storage systems exist today. These forms include network attached storage (NAS), storage area networks (SAN's), and others. Network-based storage systems are commonly used for a variety of purposes, such as providing multiple users with access to shared data, backing up critical data (e.g., by data mirroring), etc.
A network-based storage system typically includes at least one storage server, which is a processing system configured to store and retrieve data on behalf of one or more client processing systems (clients). In the context of NAS, a storage server may be a file server, which is sometimes called a “filer”. A filer operates on behalf of one or more clients to store and manage shared files. The files may be stored in a storage system that includes one or more arrays of mass storage devices, such as magnetic or optical disks or tapes, by using a data storage scheme such as Redundant Array of Inexpensive Disks (RAID). Additionally, the mass storage devices in each array may be organized into one or more separate RAID groups. In a SAN context, a storage server provides clients with block-level access to stored data, rather than file-level access. Some storage servers are capable of providing clients with both file-level access and block-level access, such as certain storage servers made by NetApp, Inc. (NetApp®) of Sunnyvale, Calif.
FIG. 1 is a prior art illustrative embodiment of a Write Anywhere File Layout (WAFL) file system. Referring to FIG. 1, WAFL aggregate 100 is an instance of the WAFL file system. WAFL aggregate 100 includes one or more flexible volumes 110, one or more volume containers 120, and physical storage 130.
WAFL aggregate 100 is a physical storage container that can store data in the WAFL file system. Flexible volume 110 is a logical volume that allows the virtualization of the allocation of volumes on physical storage 130. Thereby multiple, independently managed flexible volumes 110 can share the same physical storage (e.g., physical storage 130). The virtualization requires mapping between virtual volume block numbers (VVBNs) used by flexible volume 110 and physical volume block numbers (PVBNs) used by WAFL aggregate 100 to access data stored in physical storage 130. A PVBN, as used herein, refers disk blocks that have been abstracted into a single linear sequence in the aggregate. Each volume container 120 corresponds to a flexible volume 110. Volume container 120 contains all the data blocks for a corresponding flexible volume 110.
As used herein, a block offset or an offset refers to a distance in blocks from the beginning of a storage object such as a volume, file, extent, etc. Block addresses used within flexible volume 110 refer to block offsets within volume container 120. Since volume container 120 contains every block within flexible volume 110, there are two ways to refer to the location of a particular block. The PVBN specifies the location of a block within WAFL aggregate 100. The VVBN specifies the offset of the block within the container file. When a block in a file is requested, flexible volume 110 translates the file offset into a VVBN. The VVBN is passed from flexible volume 110 to volume container 120. Volume container 120 translates the VVBN to a PBVN. The PVBN is then used to access the requested block in physical storage 130. Additionally, when a PVBN is initially written, the block pointer for the PVBN in flexible volume 110 is written to include (e.g., in a cache) the PVBN for the VVBN. Thereby, when the requested block is required, the flexible volume 110 can use the stored PVBN to access physical storage 130.
Current implementations of WAFL define a file as a tree of indirect blocks. Each indirect block in the tree has a fixed span: a fixed number of entries, each pointing to another block in the tree. Extents are represented using an entry for each block within the extent. An extent, as used herein, refers a contiguous group of one or more blocks. As a result, the amount of indirect block metadata is linear with respect to the size of the file. Additionally, disk gardening techniques, such as segment cleaning, file reallocation, etc., are complicated by caching PVBN pointers in VVBN blocks.
Storage systems often use a predetermined block size for all internal operations. For example, WAFL uses 4 KB (e.g., 4096 bytes) blocks for both VVBN and PVBN, as do client-side file systems for file block numbers (FBN). Block boundaries are expected to occur every 4 KB from an initial offset (e.g., FBN 0). Since file systems usually offset individual files based on these block boundaries, application writers take advantage of a file system's block size and alignment to increase the performance of their input/output (“I/O”) operations—for example, always performing I/O operations that are a multiple of 4 KB, and always aligning these operations to the beginning of a file. Other file systems or applications, such as a virtual machine, may use a block boundary of a different size (e.g., a virtual machine environment in which an initial master boot record block of 512 bytes is followed by the expected 4 KB blocks), resulting in misalignment between FBN's and PVBN's. Additionally, multiple virtual machines may share a single volume container 120 and each virtual machine may misaligned by a different amount.
Storage servers may implement a deduplication algorithm Deduplication eliminates redundant copies of data that is stored within the data storage. Deduplication is accomplished in several ways, including hierarchical deduplication, in-line deduplication, and background deduplication.
Hierarchical deduplication includes deriving one file from another, usually by one file starting off as copy of another, but zero or nearly zero bytes of data are actually copied or moved. Instead, the two files share common blocks of data storage. An example is a snapshot, where a snapshot is made of a file system, such that the snapshot and active file system are equal at the time snapshot is taken, and share the same data storage, and thus are effectively copies that involve zero or near zero movement of data. As the source file system changes, the number of shared blocks of data storage reduces. A variation of this is a writable snapshot (also referred to as a clone) which is taken of a file system. In this variation as the source and cloned file systems each change, there are fewer shared blocks.
In-line deduplication includes a storage access protocol initiator (e.g. an NFS client) creating content via write operations, while the target of the storage access protocol checks if the content being written is duplicated somewhere else on the target's storage. If so, the data is not written. Instead, the logical content (e.g., metadata, pointer, etc.) refers to the duplicate.
Background deduplication includes a background task (e.g., on a storage access protocol target) scanning for duplicate blocks, freeing all but one of the duplicates, and mapping corresponding pointers (or other logical content) from the now free blocks to the remaining duplicate.
However, these existing deduplication algorithms allow for sharing of data storage, but have an impact on performance of the system since the data must be processed as it is received. Furthermore, metadata used by the active file system and snapshots is not deduplicated, thereby not maximizing the space efficiency of the active file system and snapshots.