A file system in a computer specifies the way data files are stored on a storage volume of the computer (e.g., a hard disk), and how the files are retrieved from that volume. For example, Windows, OS/2, Macintosh, and UNIX-based operating systems have file systems that use a hierarchical or tree structure for maintaining files. File systems also specify conventions for naming files, such as how many characters can be used in a file name, which characters can be used in a file name, and how many characters are allowed in the suffix of a file name.
The term “file system” is often used to refer to the file system driver, program, or part of the operating system that manages the file system structure, conventions, and related tasks. Thus, the file system performs file operations such as opening, reading, writing, renaming, and closing files in response to different user/application requests. One significant aspect of managing file system transactions is maintaining internal integrity within the file system. In general, the file system expects data structures on a hard disk (i.e., the storage volume) to be consistent and in keeping with the general file system format. For example, when data is written to the disk in a particular format, the file system expects that the data should be able to be read back from the disk in the same format. However, there are various circumstances that can cause corruption of the data on the disk. A problem with the disk (i.e., the physical storage media) may result in dropped data bits, for example. There may be a connectivity problem between the computer and the hard disk that results in data not being written correctly to the disk or read back correctly from the disk. There may be a programming bug in the operating system or a driver that results in data being written to random locations in memory. Thus, various problems with data transactions and other issues can cause corruption of a file system's data structures.
File systems typically employ a process to check for and fix corruptions caused by incorrect or incomplete transactions. For example, the NTFS (New Technology File System) file system, used by various Windows® brand operating systems from Microsoft® Corporation of Redmond, Wash., employs a Autochk/Chkdsk utility that scans the volume to make sure the data structures are consistent and that no corruptions exist. The Autochk/Chkdsk utility is run on NTFS volumes each time they are mounted on the system, which most commonly occurs when the system is booted or rebooted. When NTFS discovers a corruption problem on a running volume, it marks the volume as ‘dirty’ and presents the user with a corruption error. Upon reboot of the system, then, if NTFS encounters a ‘dirty’ volume or other inconsistency, Autochk/Chkdsk is automatically executed and the boot request to mount the volume is delayed or declined. The Chkdsk utility can also be initiated manually, for example, by a system administrator of a larger computer system who wants to control the times when the utility runs. While Chkdsk runs against the volume (i.e., scans the volume for repair), users cannot access the volume. Depending on the types of corruptions found, Chkdsk may take specific corrective actions to repair the corruptions. For example, if there is inconsistent or corrupt information on a disk that is supposed to indicate where a file is allocated, Chkdsk will delete the file. If there is an unreadable or corrupt file directory, Chkdsk will rebuild the directory.
Although such file system recovery utilities (e.g., Chkdsk) are generally successful at repairing file system corruptions, they have disadvantages. One disadvantage is that they can be disruptive to users, as noted above. Chkdsk can take a long time to run, and it requires exclusive access to the volume (e.g., hard disk) it is scanning and repairing. Therefore, upon booting up a computer, a user may not have access to the volume(s), but instead must wait until Autochk/Chkdsk finishes its repairs before the boot process can be completed. On a large server, servicing an enterprise system for example, the time it takes autochk/Chkdsk to run can be significant. Large servers can have millions of files that may take many hours (e.g., 10-15 hours) to process with autochk/Chkdsk. Thus, many users can be inconvenienced if an administrator is not careful about the time of day that the autochk/Chkdsk utility is executed.
Accordingly, a need exists for a way to repair file system corruptions on a volume without disrupting or preventing access to the volume.