It is well known for data storage apparatus, such as a tape drive, to compress incoming data prior to storing the data to a backup medium, such as tape. An example of such a tape drive is one conforming to the DDS (Digital Date Storage) format, defined in ISO/IEC Standard 10777:1991 E. As described in detail in EP 0 464 190, a DDS tape drive encodes received data records, by compressing the data records using a codeword based algorithm, and stores the compressed data codewords into fixed length groups.
One characteristic of a codeword based compression algorithm is that data can only be decompressed if decompression begins from exactly the same position in an encoded data stream as where the compression began. The position in the data stream where both compression and decompression must begin is sometimes called an access point. Codeword algorithms of this type include the well-known LZ (Lempel-Ziv) algorithms, which will not be described in detail herein. Although not considered herein, an encoded data stream might comprise encrypted data rather than compressed data. However, the same requirements for an access point apply.
Prior to compressing the data, a DDS drive strips out all host data structure information, such as file mark and set mark information, received from the host computer system. The effect of this is that the compressed data stream in a group only contains host data.
Typically, hosts can issue commands to a tape drive to `space` to, or `locate`, different positions in the encoded data stream. Such an operation is conveniently supported in a DDS drive by the provision of index information for each group, in the form of a block access table (BAT). The BAT of includes a series of entries, each one corresponding to a respective record, file mark or set mark in the group. Each entry corresponding to a record includes the length in bytes of the respective compressed record data in the group. In effect, the BAT contains a logical map of the data in the group, which provides a derivable byte position of any record boundary, file mark or set mark within the group. Therefore, a DDS tape drive can move to any valid logical target position within the encoded data stream simply by using the BAT to calculate the byte position of the target and decoding from the access point to that byte position.
In the co-pending, commonly assigned U.S. patent application Ser. No. 09/182,308 (filed on Oct. 30, 1998), entitled Data Encoding Method and Apparatus, the contents of which are hereby incorporated herein by reference, there is proposed a novel data encoding scheme suitable for tape drives. In the scheme, the requirement to have a BAT, or equivalent, is removed by embedding special codewords representative of host data structure information into the encoded data stream itself. In particular, special codewords are reserved to represent file marks and ends of records in an encoded data stream. The removal of the need for a BAT enables the encoding pipeline of a tape drive implementing the new scheme to operate at far higher data rates, since there is no requirement to generate and update a BAT.
The novel data encoding scheme, in common with DDS drives, implements a codeword-based compression algorithm, which allows data to be read only in the forward direction starting from an access point.
The lack of a BAT in the applicant's new scheme, however, presents a problem as far as moving to target positions is concerned. The lack of a BAT means it is not possible to predetermine the byte position of a target in a stream of encoded data. This is particularly problematic when a target position is specified as being upstream (or backwards) in the encoded data stream.