The present invention relates to data storage, and more particularly, to reading data from tape using a reconstructive error recovery procedure (ERP) to reduce backhitches while reading data.
Tape and optical storage devices use very powerful error correction codes, such as product codes or concatenated codes, in conjunction with interleaving to provide a very high degree of data integrity. These error correction schemes typically use two error correction codes (ECCs) as component codes. Two important burst-error performance measures for tape storage systems protected by these schemes are: 1) lateral width of an erroneous stripe which is still capable of being corrected (this is also known as “broken track correction” capability), and 2) longitudinal width of an erroneous stripe that is still capable of being corrected. A “broken” track generally refers to a track that cannot be read correctly due to a problem on the media itself and/or a problem with the readback channel, e.g., as a channel that does not detect data correctly because of alignment or some systematic problem with the head.
When a tape drive reads data from a tape, or when a tape drive writes data to a tape, a unit of data that is read or written is referred to as a “data set.” The data set is encoded using interleaved sets of codewords that are organized into an ECC-encoded matrix of size M bytes×N bytes (M×N) and then written to tape as shown in FIG. 1, according to the prior art. There are two levels of encoding within this matrix 150. The first level of encoding utilizes the matrix rows 102. Each row 102 of the matrix contains C1-ECC row parity 106, which adds p-bytes of C1-ECC to the n-bytes of user data (e.g., N=n+p bytes). The second level of encoding, C2-ECC column parity 108, adds q-bytes of C2-ECC to each matrix column 104. For example, if q=12, then adding 12 bytes of C2-ECC would add 12 rows to the matrix 150 (e.g., M=m+q bytes).
When the data set is read from the tape in a high error rate condition, C1/C2 ECC is not capable of correcting the read data. For example, in some approaches, C1-encoding is capable of correcting 10 bytes of error, and C2-encoding is capable of correcting 20 bytes of error. If the error bytes exceed this correction power, then data cannot be read from the tape. In this scenario, the tape drive will then attempt ERP to read the data set from the tape again with a different hardware setting (e.g., changing the tape speed). ERP repeats until C1/C2-encoding is able to correct the data or until the ERP retry count exceeds a threshold. If the retry count exceeds the threshold, then the tape drive will report a permanent error for the read operation.
There are several problems with this conventional approach. First, if the error rate is consistently high, C1 and C2 cannot correct the data and the tape drive fails to read the data set, which is extremely undesirable. Second, in areas of tape where the error rate is high due to media damage, marginal writing, data written in older formats, etc., the drive may fail to read. Third, in these cases, often many data sets in proximity may require ERP. If a dataset error is recoverable with many retries (many iterations of ERP), the next data set may presumably require similar retries to successfully read the data. All this recovery causes the tape drive to take a long time to read data due to the multiple backhitches necessary to reread the data from the tape, which degrades host performance and can also further damage media.
Accordingly, it would be beneficial to have a data recovery procedure that increases the efficiency of reading stored data from the tape.