A storage system is a computer that provides storage service relating to the organization of information on writable persistent storage devices, such as memories, tapes or disks. The storage system is commonly deployed within a storage area network (SAN) or a network attached storage (NAS) environment. When used within a NAS environment, the storage system may be embodied as a file server including an operating system that implements a file system to logically organize the information as a hierarchical structure of data containers, such as directories and files on, e.g., the disks. When used within a SAN environment, the storage system may organize the information in the form of databases or files. Where the information is organized as files, the client requesting the information typically maintains file mappings and manages file semantics, while its requests (and system responses) address the information in terms of block addressing on disk using, e.g., a logical unit number.
Each “on-disk” file may be implemented as set of data structures, i.e., disk blocks, configured to store information, such as the actual data for the file. These data blocks are typically organized within a volume block number (vbn) space that is maintained by the file system. The file system may also assign each data block in the file a corresponding “file offset” or file block number (fbn) position in the file. The file system typically assigns sequences of fbns on a per-file basis, whereas vbns are assigned over a larger volume address space. That is, the file system organizes the data blocks within the vbn space as a volume; each volume may be, although is not necessarily, associated with its own file system. The file system typically consists of a contiguous range of vbns from zero to n, for a file system of size n−1 blocks.
The storage system may be further configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on the system. In this model, the client may comprise an application, such as a database application, executing on a computer that “connects” to the storage system over a computer network, such as a point-to-point link, shared local area network, wide area network or virtual private network implemented over a public network, such as the Internet. The client typically communicates with the storage system by exchanging discrete frames or packets of data according to pre-defined protocols, such as the Transmission Control Protocol/Internet Protocol (TCP/IP). In addition, the client may request the services of the system by issuing file system protocol messages over the network to the storage system.
A file system may have the capability to generate a snapshot of its active file system. An “active file system” is a file system to which data can be both written and read, or, more generally, an active store that responds to both read and write operations. The snapshot is another form of a data container that refers to a copy of file system data that diverges from the active file system over time as the active file system is modified. Snapshots are well-known and described in U.S. patent application Ser. No. 09/932,578 entitled Instant Snapshot by Blake Lewis et al., now issued as U.S. Pat. No. 7,494,445 on Nov. 11, 2008, TR3002 File System Design for a NFS File Server Appliance by David Hitz et al., published by Network Appliance, Inc. and in U.S. Pat. No. 5,819,292 entitled Method for Maintaining Consistent States of a File System and For Creating User-Accessible Read-Only Copies of a File System, by David Hitz et al., each of which is hereby incorporated by reference as though fully set forth herein.
A common type of file system is a “write in-place” file system, where the locations of the data structures on disk are typically fixed. That is, the disk is “viewed” as a large sequential array of blocks and changes (updates) to the data of a file stored in the blocks are made in-place, i.e., data is overwritten at the same disk locations. The write in-place file system may assume a layout such that the data is substantially contiguously arranged on disks. This disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. Updating of data in-place thus maintains efficient read access to the data, but often at the expense of write performance.
Another type of file system is a journal or log-structured file system that generally does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory of the storage system and “dirtied” or changed (updated) with new data provided by, e.g., an application, the data block is stored (written) to a new location on disk to optimize write performance. Updates to the data of a file may thus result in random, relocation of the blocks on disks. Over time and after many updates, the blocks of the file may become randomly scattered over the disks such that the file can become fragmented. This, in turn, causes sequential access operations, such as sequential read operations, of the file to randomly access the disks. Random access operations to a fragmented file are generally much slower than sequential access operations, thereby adversely impacting the overall performance of those operations.
The on-disk format representation of the log-structured file system may be block-based using fixed-sized file system blocks. As a result, fragmentation may be fine-grained relative to disk drive and application performance. For example, the size of the blocks used by an application executing on a client may be larger than the fixed-sized blocks of the file system and, thus, must be fragmented internally on the file system. These fragmented blocks may thereafter be stored over various portions of disks such that, over time, sequential read performance is adversely impacted. The adverse performance caused by fragmentation typically manifests in block-type workloads, such as Small Computer Systems Interface (SCSI), Fibre Channel (FC), iSCSI, database over Network File System (NFS), and other enterprise applications involving SAN offerings. In such offerings, the application expects to access large, contiguous regions of disk space such that, when the client writes data into those regions, the data (i.e., the active data) remains contiguous.
Often, the fragmentation resulting from operation of the log-structured file system is exasperated by the large amount of metadata that is represented in the file system. As used herein, the term metadata denotes information used to describe, e.g., data containers, such as files, and, thus, may include indirect blocks. For example, a typical database workload involves many write operations to a large file that appear random from a file system perspective. Depending upon the frequency at which updates are “flushed” to disk, only a few data blocks may be written to the file during an update interval and those blocks are typically scattered. However, the file system may need to write (modify) indirect blocks associated with those data blocks. At least one indirect block is typically written for every modified data block, thereby causing potentially substantial metadata processing, i.e., change.