Field of the Invention
Embodiments of the present invention relate generally to data storage and management and more particularly to a method and system for scrubbing data within a data storage subsystem.
Description of the Related Art
Recently, enterprises have become more dependent on the ability to store, organize, manage and distribute data. Accordingly, “information lifecycle management,” the process of managing business data from conception until disposal in a manner that optimizes storage, access, and cost characteristics has become increasingly important. One important facet of information lifecycle management is the processes and mechanisms by which the validity of data may be determined, and, where invalid data is discovered, by which such invalid data may optionally be corrected or discarded. The process of validating (and in some cases correcting invalid data) is typically referred to as “data scrubbing.”
Data scrubbing may be implemented using any of a number of techniques or methods. Using error-correcting code (ECC) for example, each group of data is stored along with metadata describing the bit sequence of the group. When a group of data is subsequently accessed, the metadata may be recalculated and compared to the previously-stored metadata to determine if an error is present and, in some cases (e.g., where a so-called “soft error” has occurred) which bit(s) need to be modified to correct the error(s). Data scrubbing may be provided on various levels within a data storage subsystem and may be implemented in hardware, software, or a combination thereof.
To ensure that data being scrubbed is in a consistent state during validation, data to be scrubbed may not typically be accessed directly while a scrubbing operation is being performed. Consequently, a redundant (e.g., cached) copy of the data to be scrubbed is provided for access, or alternatively all access to the data to be scrubbed is simply suspended, during data scrubbing operations in conventional data storage subsystems. While redundant storage or caching techniques enable data access to continue while a data scrub is performed, such caches are an additional expense and may not be feasible to implement due to increased complexity and/or size constraints. By contrast, data scrubbing techniques which rely on the complete cessation of data access typically add nominal or no expense to a data storage subsystem but can significantly increase the latency with which data may be accessed.
System administrators frequently make copies of logical volumes, for example in order to perform backups or to test and validate new applications. Such copies are 20 commonly referred to as snapshots.