Data scrubbing is a process of accessing data in a memory or storage location to ensure that the data will be accessible later when needed. For a high level of data protection, the data is stored in memory or storage using error detection and correction coding or redundant storage array (RAID) techniques so that most data errors are detected and corrected automatically when the data is accessed during scrubbing. For example, in a data storage system, cache memory and disk storage are scrubbed periodically or during idle time to prevent data loss while minimizing performance loss and impact of any failure.
In the data storage system, a maintenance procedure periodically reads and rewrites all areas of memory to detect potential errors, and maintains a record of errors for each memory segment. Any small “soft” errors are detected and corrected, while the potential for larger “hard” errors is dramatically reduced. If a predetermined error threshold is reached in a certain segment, or a permanent error is confirmed, the maintenance procedure moves that memory content to another area in memory and the failed component is removed from service, or “fenced.”
In the data storage system, the maintenance procedure also scrubs all data blocks on the disk drives. During a production lull, the maintenance procedure reads and rewrites all blocks on all tracks on all physical disks and then checks for errors or inconsistencies. Bad sectors are fenced off. If the problem is significant, usable data is copied off the failing disk and the failing disk is taken out of service.
Deduplication is a technology that normalizes duplicate data to a single shared data object to achieve storage capacity efficiency. A data deduplication procedure searches for duplicate data and discards the duplicate data when located. When duplicate data is detected, it is not retained; instead, a “data pointer” is modified so that the storage system references an exact copy of the data already stored on disk. In this fashion, multiple source data objects such as files may share de-duplicated data objects such as data blocks or segments of data blocks. This reduces the storage requirements and also improves backup and recovery efficiency.
Data compression upon the deduplicated data objects can further reduce the storage requirements. Data compression is the encoding of data to reduce its storage requirements. Data compression is optimized for a single data object and reduces its footprint, while deduplication works across data objects.
In a deduplicated data store, all of the deduplicated data objects have been data scrubbed either periodically or not at all.