The present invention relates to the field of data storage.
In known data storage systems for storage of digital data, for example linear tape data storage devices according to the linear tape open format (LTO), data is stored as a number of data sets, each of which consists of 16 sub data sets, each sub data set having 25,272 bytes.
Referring to FIG. 1 herein, there is illustrated schematically in perspective view, a tape data storage device suitable for installation into a host computer for performing tape data storage and back-up operations, the device comprising a casing 100 having a front panel 101 containing a port 102 configured to accept a linear tape data storage medium stored on a cassette cartridge; internally a read/write head, tape drive mechanism for transporting the tape past the read/write head, memory, and firmware for encoding/decoding data to be read from and written to the tape data storage medium and an interface for interfacing recovered data to a host computer device.
Referring to FIG. 2 herein, there is illustrated schematically a read channel of a digital data tape storage device operating in linear tape organization (LTO) format. The read channel comprises a read head 200 for reading digital data from a linear tape cassette to a C1 error corrector, for applying C1 error correction to a data stream output from read head 200; a C2 corrector 202 for applying C2 correction to the C1 corrected data stream; a buffer memory 203 for collecting complete sub data sets upon which C2 error correction may be performed; a decompressor 204 for applying decompression to C2 error corrected sub data sets; and a data history buffer memory 205 for accumulating a rolling data window of sub data sets accessible to said decompressor 204 for applying decompression. An output of the read channel comprises an error corrected decompressed data stream comprising a series of data sets, each comprising 16 sub data sets.
Referring to FIG. 3 herein, in a known LTO (linear tape organization) structure, 404,352 bytes of data; forms a data set, of which 403,884 bytes are user data and 468 bytes form a data set information table (DSIT), the data set being sub divided into 16 sub data sets. Each sub data set contains 25, 272 bytes of data.
Referring to FIG. 4 herein, each sub data set comprises 25,272 bytes of user data; C1 correction bytes; and C2 correction bytes.
The prior art digital tape storage device performs a read operation as follows:
when data is read back through a read channel of a tape drive device, the read channel typically checks the correctness of the 25,272 bytes user data by verifying first of all the C1 error correction code as the data is read from tape into the tape drive unit, and once the whole sub data set is stored in a buffer of the tape drive device, reads the C2 error correction code.
In the prior art device, if any of the 16 sub data sets is found to be in error, then the hardware or firmware of the tape data storage unit will indicate that the whole data set is in error. However, a 16 bit value is generated indicating which of the 16 sub data sets is found to be in error within the data set.
In the prior art system, if any one of the 16 sub data sets is found to be erroneous, then the whole data set from which that data sub set originated is deemed to be erroneous and is discarded. If the host system attempts to read anything within that data set, then a read error is generated.
Therefore in the prior art system, although each sub data set is effectively independent, the prior art format assumes dependency between all 16 sub data sets in the data set, and stipulates that if any sub data set is found to be erroneous, then the whole data set is deemed to be erroneous.
This feature is based on the assumption that once a bad block of data in a sub data set has been found, any data within the same data set after that corrupted sub data set cannot be trusted to be correct, because compression is applied to the data, and the compression algorithm used relies on a knowledge of preceding data to apply the compression. Therefore, to decompress the data, all the preceding data is required, and if some of that preceding data is erroneous then decompression cannot be effected. The assumption is that for any portion of data within the data set, in order to decompress that data, the compression algorithm generally requires all data in the same data set which has preceded that section of data.
Further, at the beginning of a data set, data from a preceding data set is required in order to decode data at the beginning of the current data set. Therefore, successive data sets are to some extent inter dependent upon each other. For a series of successively read data sets, a sequence of sub data sets needs to be available which includes the access point immediately prior to the required users data, and all intervening sub data sets, for the prior art compression algorithm to decompress the data in the known prior art linear tape open data format.
However, the decompression algorithm itself, has a 1 Kbyte rolling data window. Therefore it does not need to have all 16 sub data sets, and will build up a data history as it reads along the data sets.
The prior art system has a disadvantage that even where required user data has already been correctly read from a data set, and an error occurs in subsequent data within the same data set, which is not required, the required correctly read user data must still be discarded, because the error occurred later on in the subsequent data within the same data set. Useful correctly read data may be discarded, because of errors in unwanted data occurring within the same data set in the prior art system.
The prior art format currently performs a series of read retry operations in order to try to recover a whole data set. However, if all these attempts fail, then the data set is deemed to be unrecoverable and a read error is reported to a host computer resulting in a failed data restore operation.
Specific implementations according to the present invention aim to provide a method and apparatus which can recover valid user data within data set containing erroneous sub data sets.
Specific implementations of the inventions may operate to check individual sub data sets marked as erroneous within a data set, to determine whether they contain user data. If not, then errors within those sub data sets can be ignored, and a users data can be restored from the data set to a host computer entity.
According to first aspect of the present invention there is provided a method of recovering user data from a data storage medium, wherein said user data is stored on said data storage medium in the format:
a data set comprising a plurality of sub data sets;
each said sub data set comprising a data portion capable of storing user data, a C1 error correction code, and a C2 error correction code; and
said data set comprising a valid data length data describing a number of bytes of said data set comprising said user data within said data set;
said method comprising the steps of:
reading at least one said data set;
performing C2 error correction;
determining whether a result of said C2 error correction is successful;
if said result of said C2 error correction is unsuccessful, reading said valid data length data;
calculating an amount of good data from a start of said data set;
comparing said amount of good data with said valid data length data; and
depending upon a result of said comparison, accepting user data contained in set data set as valid or rejecting user data contained in said data set as invalid.
According to a second aspect of the present invention there is provided a data storage device capable of recovering user data from a data storage medium, wherein said user data is stored on said data storage medium in the format:
at least one data set comprising a plurality of sub data sets;
each said sub data set comprising a data portion capable of storing user data, and an error correction code; and
said data set comprising a valid data length data describing a number of bytes of said data set comprising said user data within said set;
said data storage device comprising:
a read head and circuitry for reading said data sets;
an error corrector for performing error corrections;
a read controller for determining whether a result of said error correction is successful;
if said result of said error correction is unsuccessful, reading said valid data length data;
calculating an amount of good data from a start of said data set; comparing said amount of good data with said valid data length data; and
depending upon the result of said comparison, generating a signal for accepting user data contained in set data set as valid or rejecting user data contained in set data set as invalid.
According to a third aspect of the present invention there is provided a method of recovering user data from a series of a plurality of data sets read from a data storage medium, each said data set comprising a plurality of sub data sets, each sub data set comprising a data field for storage of data, and an error correction code for error correcting a whole of said sub data set; said method comprising the steps of:
reading a first said data set;
determining whether any sub data sets within said first data set have errors which are uncorrectable by said error correction code;
if a sub data set having uncorrected errors is found, then determining an amount of correctly recovered data from data fields of other sub data sets, within a same data set as said erroneous sub data set in which said error occurred;
reading a valid data length data from a data field within said data set;
comparing said amount of correctly recovered data within said valid data length; and
if said valid data length is less than an amount of correctly recovered data, then treating said data set as valid.
The invention includes a computer program comprising program instructions for:
reading a first data set;
determining whether any sub data sets within said first data have errors which are uncorrectable by an error correction code;
if a sub data set having uncorrectable errors is found, then determining an amount of correctly recovered data from other sub data sets within said first data set;
reading a valid data length from a data field within said first data set;
comparing an amount of data correctly recovered from said first data set with said valid data length; and
if said valid data length is less than an amount of correctly recovered data, then treating said data set as valid.
The invention includes a computer program comprising program instructions which, when loaded into a re-configurable tape data storage device control said tape data storage device to:
read a first data set;
determine whether any sub data sets within said first data set have errors which are uncorrectable by an error correction code;
if a sub data set having uncorrected errors is found, then to determine an amount of correctly recovered data from data fields of other sub data sets of said first data set;
read a valid data length data from a data field within said data set;
compare said amount of correctly recovered data with said valid data length; and
if said valid data length is less than an amount of correctly recovered data, then treat said data as valid.