Computer storage disks sometimes fail to write the intended data to the rotating disk due to head seek errors, previously undetected disk media defects, mechanical vibrations, and insufficient write current to the heads. Very often the disk write error can be accepted, or it can be corrected with backups or correction algorithms. But, in some applications, especially those involving financial transaction processing, all disk write errors are serious and none can be tolerated.
Conventional disk write error detection methods and devices therefore depend on a write-and-verify techniques. Each disk drive or storage controller will write the data to an intended track location, and then read the data back from that location to check if it matches what was supposed to have been written. The trouble is, the disk read needed to verify the write cannot proceed until the disk has taken a full revolution, and the affected track location has been returned to the heads. In a 10K RPM disk drive, one revolution takes six milliseconds. Many other jobs for the heads will have to queue up and wait for the write-and-verify cycle to complete.
Conventional approaches to detect disk write errors usually come with large performance penalties, e.g., synchronous write verify, or require significant space, overhead, and disk layout changes, such as storing features/checksums of all the data written on disk.
Methods have been tried to prevent silent write errors and to detect phantom writes by saving a CRC or other value associated with a block in a separate storage or memory device and later matched with the block. If the CRC or other value stored with the block does not match the CRC or other value stored separately, a silent write error or phantom write or other error may have occurred and corrective actions may be taken. A signature stored in a memory device may be requested to be identified to be overwritten when signatures from local memory and system storage match.
The prior art includes methods for ensuring data integrity while writing data on storage medium by storing data in temporary storage medium after receipt of data before writing to main storage. The data is then written to at least one data storage device while the data is also retained in the temporary memory. The same data, an ECC code generated from the data, or other chosen criteria are then read from the data storage device and compared to that stored in the memory storage medium or to the data's error checking and correction code. Data is written and compared prior to removing the data from the temporary memory storage medium.
Improved mirroring and dual copy techniques include first storing the data to be copied into a temporary storage location and then comparing that temporarily stored data to a copy written to the mirroring device. Such a check can compare original data against a copy of that data, or an error checking and correction code of each can be compared. In either case, if no error is returned, then the copy is validated and the temporary data is removed. If an error occurred, the data is recopied and the comparison is repeated.
Others have developed methods for detecting a phantom write error when executing a read command pertaining to a data block stored on a storage medium. Upon receiving a read command pertaining to the data block, two version identifiers associated with the data block are compared. The first version identifier is stored within the data block and the second version identifier is stored outside of the data block. If the version identifiers do not match, the possible occurrence of a phantom write error is detected.
Silent errors have been detected by storing a checksum in a location that is independent of the location where the data verified by that checksum is stored.
Conventional data validation methods used in data storage system verify a version identifier integrity meta data (IMD) and check-sum IMD. The checksum is stored separately from the data to detect phantom write error.
A reloadable memory provided with a portion to be written with check data has also been tried. The check data is written each time the memory is loaded with operative data. The written check data is read at a specified time, and judged to see if the read data agrees with the written check data, e.g., to detect an occurrence of abnormality in the memory.