§ 1.1 Field of the Invention
The present invention concerns computer storage and file systems. More specifically, the present invention concerns techniques for detecting (severe) system failures in a file system and maintaining file system consistency in the event that such failures occur.
§ 1.2 Related Art
Data generated by, and for use by, computers is stored in file systems. File systems typically maintain and manage so-called “Metadata”. Metadata includes (e.g., all) file system structure, but excludes the actual data (e.g., the contents of the files). For example, Metadata may define directories and subdirectories of files (e.g., normal files, directories, hard links, symbolic links, sockets, named pipes, character devices, and block devices), the top-most directory being referred to as “the root directory”. So-called “file control blocks” maintain information about each file in the file system. In the UNIX operating system, a so-called “Inode” block is used as a file control block. An Inode block may include a number of Inodes. Each Inode may include mode, link count, file modification time, Inode modification time, a block list (e.g., disk block numbers of the file that the Inode describes), etc. Such information typically does not include the name of the file. Rather, the directories and subdirectories include the file names, and map such names to the corresponding Inodes blocks (or some other file control block). As can be appreciated from the foregoing, to “get to” a file, the file system has to go through what may be an arbitrarily long chain of directory and Inode block (or some other file control block) references. As can be appreciated from the foregoing, a file system “maps” a logical file system onto the physical storage device(s).
File systems are typically maintained on so-called “secondary storage”. While “main memory” is typically volatile and relatively small, secondary storage is larger and non-volatile (e.g., contents persist through power failures and system reboots). Typically, magnetic and/or optical disk-based storage devices are used for secondary storage, while RAM is used for main memory.
Errors in the file system can be introduced through a number of ways. For example, a “bad spot” can occur on a storage medium (e.g., disk) used for secondary storage, for a number of reasons, none of which is particularly relevant. Such a “bad spot” can corrupt data in a file. While corrupted files of data are certainly undesirable, if a bad spot on a disk corrupts a directory structure or Inode block (or some other file control block), an entire sub-section (e.g., a sub-tree) of the file system can become inaccessible. Sadly, many current file systems cannot withstand serious faults, such as power loss or disk failure, without significant recovery time and/or data loss.
Most of the protection provided now for file systems is designed at the hardware level, using disk or server redundancy, backup power supplies and non-volatile memory. Such solutions tend to be expensive and cannot handle some failure scenarios.
Present software-based solutions to file system corruption can be divided into two categories—namely file system check and log (or journal) replay. File system check methods read all of the system information structures on the disk for inconsistencies. Any inconsistencies discovered are repaired on a best-efforts basis. Examples of such file system check methods include FSCK in the Unix and Linux operating systems, and SCAN DISK in the Windows operating system. If there are too many problems, the file system might not be repairable. Further, the recovery times using these methods are relatively slow, and may become unacceptable as the size of file systems grows.
The log (or journal) replay systems maintain a log or journal of the latest transactions performed by the file system. Periodically, these logged transactions are archived (e.g., committed to a storage device and removed from the log). In the case of a severe failure, the transactions still in the archived log are committed, or “rolled back” after the file server is restated. These methods speed up the recovery, but can still take a long time, particularly in heavily accessed file systems. Possible corruption of the archived log (or journal) is, in itself, an additional potential catastrophic failure point.
§ 1.3 Unmet Needs
In view of the foregoing disadvantages of known ways to detect and recover from file system errors, there is a need for better techniques. Such techniques should shorten the time for, or eliminate the need for, file system recovery. Finally, such techniques should always protect critical system information.