In network environments where high-availability is a necessity, system administrators are constantly faced with the challenges of preserving data integrity and ensuring availability of critical system components. One critical system component in any computer processing system is its file system. File systems include software programs and data structures which define the use of underlying data storage devices. File systems are responsible for organizing disk sectors into files and directories and keeping track of which sectors belong to which file and which are not being used.
The accuracy and consistency of a file system is necessary to relate applications and data. However, there always exists the potential for data corruption in any computer system and therefore measures are taken to periodically ensure that the file system is consistent and accurate. Currently file server state is periodically backed-up or saved as a file system snapshot to allow system recovery in the event of faults or failures. In the event that data corruption is detected, one of the snapshots can be used for file system recovery.
Each snapshot must be verified for accuracy and consistency prior to storage and/or recovery. File system verification may be performed via a ‘file system check’ (FSCK) utility. The present FSCK utility takes a monolithic approach; when checking consistency of the file system, the entire file system is un-mounted and scanned completely to identify and repair data structures to provide a consistent snapshot of the file system. A variety of operations are performed during FSCK; for example file system directory structures and block counts are checked for consistency and data values may be checked for accuracy. Because the data structure contents and relationships of the file system are being verified during FSCK, no access is permitted to the file system during the FSCK process.
The amount of time required to execute the FSCK utility is generally time linear to the size of the file system, and therefore can be a time consuming task when the file system is large. For example, it may take as much as 24 hours to perform a file system check on a 16 Terabyte file system. As the size of file systems continues to grow the time required to check the file system becomes an obstacle, greatly hindering the uptime of the file system and adversely affecting the performance of the file server. It would be desirable to identify a system and method capable of providing consistency checks on large file systems while minimizing the disruption in use of the file system caused by the consistency checks.