Increasingly, computer systems are required to be continually on-line and available. This is particularly true for enterprise servers, where downtime may lead to poor productivity and/or poor customer satisfaction. At the same time, administrators (and the like) of enterprise servers like to periodically check the storage of such servers to determine if any errors have crept into the file system stored thereon. If a corruption remains unfixed, it may get worse as the file system is accessed or modified.
In the past, a system administrator generally had two options for verifying the integrity of a volume. As a first option, the system administrator could bring the volume off-line, execute diagnostic software, fix errors, and then bring the volume on-line again. This, however, has the undesirable effect of making the volume unavailable. For some types of volumes (e.g., a corporate e-mail volume), this solution may virtually halt work and productivity for an entire organization for the duration of the procedure.
As a second option, the system administrator could perform a read-only integrity check on a live volume. Because of the changing nature of files and meta-data on a live volume, this option also has problems, including falsely indicating that errors exist on a volume and aborting the integrity check because of problems encountered while attempting to verify the volume. Falsely indicating that errors exist on a volume may be worse than no indication at all, because it may cause a system administrator to take the system off-line only to find that nothing was wrong. In such a scenario, taking the system off-line to check for errors based on a false report from a diagnostic program may halt productivity. In an even worse scenario, doing so may also destroy system administrator confidence in the diagnostic tool and cause the system administrator to ignore future true errors reported by the diagnostic program.
What is needed is a method and system for verifying a volume without taking the volume off-line. Ideally, the method and system would report problems that actually exist.