A conventional file server has a number of disk drives for storing files of one or more file systems, and at least one data processor coupled to the disk drives for access to the file systems. The data processor executes various computer programs. Occasionally it becomes necessary to restart execution of the data processor by resetting the data processor. For example, the data processor is reset after its normal sequence of program execution has become disrupted by a power surge, program memory failure, or software bug. If the data processor itself has failed, then it is replaced with another data processor. In either case it is possible that one or more of the file systems have become inconsistent due to interruption of metadata transactions upon the file systems.
For example, a file server storing files in a Unix-based file system (UxFS) typically writes file system metadata changes to an “intent log” before the metadata changes are made to the file system. The metadata changes are grouped into respective transactions. Each transaction consists of metadata changes from one consistent state of the file system metadata to a next consistent state of the file system metadata. Each transaction is written into a respective record of the intent log. Each record of the intent log includes a header containing a transaction identifier (ID) and the record size, and the header is followed by the metadata changes of the transaction. The transaction ID is incremented as the records are written in sequence to the intent log, so that a first record and a last record in the log can be identified by inspecting the transaction IDs of the records in the log. Upon re-boot of the file server, the metadata changes in the intent log are replayed into the file system in order to recover a consistent state of the file system. For replay, the transaction IDs and record sizes in the records of the intent log are inspected to determine the first record in the log and the last record in the log, and to invalidate the record if the size of the record is not equal to the spacing between the transaction ID of the record and the transaction ID of the following record in the log. The intent log is replayed by sequentially reading the transactions from the intent log and writing them into the file system starting with the first record in the log and ending with the last record in the log or ending earlier when a record to be read from the log is invalidated by checking the size of the record.
It is possible for a file system log to become corrupted by circumstances such as a power surge that would disrupt normal processing and require re-boot of the file server. If the file system log is corrupted, it might not be possible to restore the file system to a consistent state that existed during normal processing. For some application programs, when the file system log does not restore the file system to a consistent state, the file system can be recovered by re-running the application program upon a backup copy of the file system. If a backup copy of the file system does not exist or if the application cannot be re-run, then often an attempt is made to repair the inconsistent file system by executing a utility program such as the Unix or Linux “fsck” utility.