A file server is a computer that provides file service relating to the organization of information on storage devices, such as disks. The file server or filer includes a storage operating system that implements a file system to logically organize the information as a hierarchical structure of directories and files on the disks. Each “on-disk” file may be implemented as a set of disk blocks configured to store information, such as text, whereas the directory may be implemented as a specially-formatted file in which information about other files and directories are stored. A filer may be configured to operate according to a client/server model of information delivery to thereby allow many clients to access files stored on a server, e.g., the filer. In this model, the client may comprise an application, such as a file system protocol, executing on a computer that “connects” to the filer over a computer network, such as a point-to-point link, shared local area network (LAN), wide area network (WAN), or virtual private network (VPN) implemented over a public network such as the Internet. Each client may request the services of the filer by issuing file system protocol messages (in the form of packets) to the filer over the network.
A common type of file system is a “write in-place” file system, an example of which is the conventional Berkeley fast file system. In a write in-place file system, the locations of the data structures, such as inodes and data blocks, on disk are typically fixed. An inode is a data structure used to store information, such as metadata, about a file, whereas the data blocks are structures used to store the actual data for the file. The information contained in an inode may include, e.g., ownership of the file, access permission for the file, size of the file, file type and references to locations on disk of the data blocks for the file. The references to the locations of the file data are provided by pointers, which may further reference indirect blocks that, in turn, reference the data blocks, depending upon the quantity of data in the file. Changes to the inodes and data blocks are made “in-place” in accordance with the write in-place file system. If an update to a file extends the quantity of data for the file, an additional data block is allocated and the appropriate inode is updated to reference that data block.
Another type of file system is a write-anywhere file system that does not overwrite data on disks. If a data block on disk is retrieved (read) from disk into memory and “dirtied” with new data, the data block is stored (written) to a new location on disk to thereby optimize write performance. A write-anywhere file system may initially assume an optimal layout such that the data is substantially contiguously arranged on disks. The optimal disk layout results in efficient access operations, particularly for sequential read operations, directed to the disks. A particular example of a write-anywhere file system that is configured to operate on a filer is the Write Anywhere File Layout (WAFL™) file system available from Network Appliance, Inc. of Sunnyvale, Calif. The WAFL file system is implemented within a microkernel as part of the overall protocol stack of the filer and associated disk storage. This microkernel is supplied as part of Network Appliance's Data ONTAP™ storage operating system, residing on the filer, that processes file-service requests from network-attached clients.
As used herein, the term “storage operating system” generally refers to the computer-executable code operable on a storage system manages data access and may, in case of a filer, implement file system semantics, such as the Data ONTAP™ storage operating system, implemented as a microkernel, and available from Network Appliance, Inc., of Sunnyvale, Calif., which implements a Write Anywhere File Layout (WAFL™) file system. The storage operating system can also be implemented as an application program operating over a general-purpose operating system, such as UNIX® or Windows NT®, or as a general-purpose operating system with configurable functionality, which is configured for storage applications as described herein.
Disk storage is typically implemented as one or more storage “volumes” that comprise physical storage disks, defining an overall logical arrangement of storage space. Currently available filer implementations can serve a large number of discrete volumes (150 or more, for example). Each volume is associated with its own file system and, for purposes hereof, volume and file system shall generally be used synonymously. The disks within a volume are typically organized as one or more groups of Redundant Array of Independent (or Inexpensive) Disks (RAID). RAID implementations enhance the reliability/integrity of data storage through the writing of data “stripes” across a given number of physical disks in the RAID group, and the appropriate caching of parity information with respect to the striped data. In the example of a WAFL-based file system, a RAID 4 implementation is advantageously employed. This implementation specifically entails the striping of data across a group of disks, and separate parity caching within a selected disk of the RAID group. As described herein, a volume typically comprises at least one data disk and one associated parity disk (or possibly data/parity) partitions in a single disk) arranged according to a RAID 4, or equivalent high-reliability, implementation.
File systems require a methodology to track the allocation status of the disk blocks within a file system. By “allocation status” it is meant whether a block has been allocated by a file or directory or whether the block is free to be allocated. File systems typically utilize a bitmap file wherein each bit is associated with a block the file system. If the bit is set (i.e. equal to 1) then the block has been allocated in the file system and is thereby associated with. Similarly, if the bit is not set (i.e. equal to 0) then the block has not been allocated in the file system and is free to be allocated.
However, in checkpointing systems, like the above-described WAFL file system, a free block cannot be allocated until the blocks' allocation status as free has been reflected in a checkpoint. Note that, in a checkpointing file system, a checkpoint of the file system is created, typically at regular time intervals. This “checkpoint” is a consistent and up-to-date version of the file system that is typically written to disk. Thus, in the event of a crash, only data written after the last checkpoint would be lost or corrupted. If a journalling file system is utilized, the stored operations can be replayed to bring the file system completely up to date after a crash other error condition. Thus, in a checkpointing system, the file system must track all of the blocks freed after the most recent checkpoint and not allocate any of those freed blocks until after the checkpoint is safely written to disk.
The newly freed blocks (post checkpoint) cannot be reused (i.e., allocated again) until after the data has been written to disk to avoid the possibility that a block could be freed and then reused before the status of the block has been written to disk. If, for example, a new checkpoint is interrupted while writing its changes (data) to disk by a server crash or other failure, the previous checkpoint could now contain data generated as part of the new checkpoint if a block that was in use in the previous checkpoint was freed after the previous checkpoint and allocated (reused) by the new checkpoint to store new data. Therefore, overwriting blocks that are known to be allocated at the time of the previous checkpoint compromises the integrity of that checkpoint and therefore the consistency and integrity of the file system itself in such situations.
In a known file server implementation, two copies of the bitmap are utilized. A “current copy” is utilized to track what has been allocated, while a “safe copy” tracks what can actually be used. Utilizing this two-copy methodology, a block can be allocated if it is marked free in the safe copy. After the checkpointing process, the current copy is moved to the safe copy and the old safe copy is freed, or otherwise disposed of. A noted disadvantage of this methodology is that the file system is not able to allocate a block while the bitmaps are being written to disk for example during a checkpoint operation. Allocation of blocks is, again, desirable at this time because various file system processes, such as restoring files from a snapshot or utilizing file folding techniques, described further below, can continue to operate during the time while the bitmaps are being written to disk. This added operating time permits these various file system processes to complete more quickly. During the writing process, the safe copy is locked or otherwise owned by a disk storage layer—for example a Redundant Array of Inexpensive (or “Independent”) Disks (RAID) layer of a storage operating system.
Certain file systems include the capability to generate snapshots, described further below, of an active file system. In such file systems, a block may be incorporated into a snapshot and then deleted from the active file system. The block will then be marked as unallocated in the bitmap; however, if the snapshot and active file system share the same logical address space, the block is still physically resident on the storage device. A problem arises when a user desires to reallocate the block from the snapshot to the active file system as the bitmap describing those blocks in the active file system has the given block marked as unallocated.