File systems utilize data structures, also referred to as file system on-disk formats, to persist and organize data on non-volatile, i.e., persistent, storage, e.g., a volume, disk, hard drive, etc. Systems access and interpret these data structures to store and retrieve data for users and applications, or procedures or computer programs, e.g., when executing software instructions or computer code.
Currently, errors, i.e., corruptions, are discovered in the file system data structures when an attempt is made to access the faulty data structure during normal application or user-instigated processing. The discovery of an error causes the volume on which the error has occurred to be flagged as corrupt. Once a volume has been flagged as corrupt a volume repair utility is generally required to attempt to bring the file system's on-disk formats into a consistent state. Typically, a volume repair utility is executed during a system boot up.
A volume repair utility requires exclusive access to the volumes of the file system so during the tenure of its execution data and information stored on the file system's volumes being repaired cannot be accessed by other applications or users. Moreover, the volume repair utility may execute for a significant amount of time which further degrades file system performance and negatively impacts user satisfaction.
Additionally, there are occurrences when it is believed that an error has been encountered during an access of an on-disk format, yet the error is not a true corruption at all, but rather can be attributed to other events, e.g., transient errors in volatile system memory, transient errors in the system's persistent storage, bugs in the file system, etc. However, currently there is no mechanism to discern real on-disk corruptions from these other error events, i.e., false positives. As a result file system volumes are unnecessarily taken offline and rendered unavailable to users and other tasks when the file system attempts to correct false positives, further causing unnecessary system disruption.
Thus, it is desirable to promote file system resiliency management for searching for, the verification of, and the correction of data structure corruptions with minimal disruption to user and application data structure access.