This invention relates to enterprise-wide data storage systems, and in particular, to methods and systems for detecting errors in data stored on such systems.
When we store data on a disk, we often take it for granted that we will one day be able to retrieve the identical data back from the disk. In reality, however, there are many more errors made in storing data on a disk than one might expect. Fortunately, error correction utilities, working invisibly in the background, can repair the overwhelming majority of these errors. That users repose such confidence in disk storage systems is a tribute to the unobtrusive effectiveness of these error correction utilities.
No matter how sophisticated an error correction utility is, it cannot repair an error that has not been brought to its attention. This function of detecting an error is achieved by error detection utilities that periodically scan the entire disk to identify disk errors. The time required to scan the disk depends in part on the size of the disk. As disks become increasingly large, the scanning time can become excessive. It is therefore desirable in the art to provide error detection utilities with disk scanning methods that are fast.
A naive approach to error detection is to compare a data record stored on a disk with another copy of the same data record stored elsewhere, either on the disk or on another disk in a disk array. A difficulty with this approach is its appetite for storage space. A requirement that a duplicate copy of each data record be maintained effectively halves the available capacity of any storage medium.
A more effective method for detecting a disk error is to store additional data that is derivable from and associated with a data record whose integrity is to be assessed (hereafter referred to as xe2x80x9cthe test recordxe2x80x9d). This additional data, hereafter referred to as xe2x80x9cmeta-data,xe2x80x9d can include checksums, CRC data, time stamps, data indicative of the physical location of the record within the drive, and parity bits. The use of meta-data to assess the integrity of a test record is advantageous because the meta-data is typically much smaller than the test record from which it was derived. Consequently, the storage capacity surrendered to the error detection process can be made much smaller.
Although the use of meta-data in the foregoing manner reduces the storage overhead associated with error detection, it does little to reduce its temporal overhead. To assess the integrity of the test record, both the test record and the meta-data are read from the disk and into memory. This consumes the time required for two read accesses. A second copy of the meta-data is then derived from the test record. This second copy is compared with the copy of the meta-data stored on disk. Both of these operations consume processing time.
Although the temporal costs associated with disk access processing small, they are incurred for each record on the disk. As a result, the process of scanning an entire disk can consume many hours of processing time that could otherwise be used to service the needs of the system""s users. Because of this, the scanning process is typically scheduled for times during which the system""s overall processing load is expected to be light, for example overnight.
As disks storage systems have evolved to include arrays of progressively larger disks, it has become progressively more difficult to scan the entire disk within a limited period. With such an overwhelmingly large number of records to scan, the foregoing disk scanning method rapidly becomes impractical.
Rather than accessing the data records, the improved scanning method of the invention works entirely with the meta-data derived from those data records. Since the meta-data is significantly smaller than the data records from which it is derived, the scanning method more rapidly scans the mass-storage element. In addition, because of the minimal memory demands of the improved scanning method, a disk-scanning utility implementing the invention can operate with minimal interference to users of the data storage system.
The invention provides a method for scanning a mass-storage element to verify the integrity of a plurality of data records stored thereon. Each data record from the plurality of data records has associated with it meta-data derived from that data record. For the case of a CDK format disk, the meta-data can include the count field associated with the records on such a disk.
The method includes defining a selected data record and generating a comparison result indicative of a difference between a first copy of meta-data associated with the selected data record and a second copy of the meta-data associated with the selected data record. On the basis of the comparison result, the integrity of the data record is then assessed. If the comparison result indicates the existence of one or more differences between the first and second copies of the meta-data, the data record is assumed to contain errors. In this case, the data record is optionally flagged to draw the attention of a subsequently executed error-correction utility. Otherwise, the data record is assumed to be free of error.
In one aspect of the invention, the first copy of meta-data associated with the selected data record is stored in a cache-memory element and the second copy of meta-data associated with the selected data record is stored in the mass-storage element. Under these circumstances a third copy of the meta-data is created from the second copy. This third copy, is placed in the cache-memory element where it can quickly be compared with the first element. A first comparison result indicative of a difference between the third copy and the first copy is then generated. Since the third copy and the first copy are both in cache memory, and since both the third copy and the first copy are small, the first comparison result can be generated quickly.
On the basis of this first comparison result, a second comparison result is generated. This second comparison result is indicative of a difference between the first copy stored in the cache-memory element and the second copy stored on the mass-storage element.
In another aspect of the invention, the cache-memory element includes a control section and a data section. In this case, the third copy is created by copying the first copy from the mass-storage element to the control section of the cache-memory element. This enables a scanning utility according to the invention to operate without competing with users for cache slots in the cache-memory element.