The present invention relates to a tape storage device and in particular, but not exclusively, to a tape storage device intended for storing host computer data. The present invention is applicable to a method of appending data to compressed data already stored on tape.
It is known to provide a tape drive having data compression capability (a DC drive) so that, as data arrives from a host, it is compressed before being written to tape thus increasing the tape storage capacity. DC drives are also able to read compressed data from tape and to decompress the data before sending it to a host. It is also possible for a host to perform software compression and/or decompression of user data.
There is more than one type of data compression. For example, removing separation marks eg. designating records, files etc. from the datastream and storing information regarding the positions of these marks in an index effectively compresses the user data. Another, quite different approach, is to compress user data by removing redundancy in the data eg. by replacing user data words with codewords or symbols from which the original data can be recovered. It is the latter type which is being referred to in this specification when the words "data compression" or abbreviation DC is used.
It is known to write user data records to tape in fixed size groups independently of the record structure of the data words and to put information regarding the contents of each group in an index associated with each group. For example, the index may contain information regarding the length of each record or partial record in the group. A further development, which is the subject of copending PCT Application no. WO 91/11 001, is to write the data to tape in groups independently of the record structure of the data words wherein each group has an associated index and to write to the index of each relevant group information about the contents of the group in terms of entities, where an entity can contain more than one record and no separate entry is written to the index about any individual record in that entity.
This approach has the advantage that it reduces the storage management overhead associated with the group index by reducing the number of entries which need to be made in the index, thus increasing the speed of access to data during read and write operations.
Tape storage devices of the aforesaid types include storage devices operative to write/read data to/from tape in accordance with the Digital Data Storage (DDS) format described in the document "Digital Data Storage Format Description" (Revision B, October 1988) available from Hewlett-Packard Limited, Bristol, England. The DDS format is based on the DAT digital audio recording format but includes modifications and extensions to the basic DAT format to render it suitable for storing computer data. An extension to the DDS format is the DDS-DC format described in the document entitled "DDS-DC Format Specification No. DDS-06 Revision A dated January 1990", also available from Hewlett Packard Ltd, Bristol, England. The DDS-DC format is for storage of compressed data on tape.
According to the DDS-DC format, compressed records are organised into entities and groups. A group index contains information on the compressed byte count for each entity in the group. Each entity has a header portion which contains information including the number of records in the entity, the uncompressed byte count for these records and the identity of the compression algorithm used to compress the data in the entity.
Currently, when appending compressed records to a point within an entity on tape, say after the nth record in an entity, the procedure is to decompress the first n records in the entity to find the end of the nth record and then to compress these n records so as to build the a dictionary (as described below) ready for compressing the new records being appended. This procedure is laborious and, in addition, there may be insufficient buffer space in the drive to store the n decompressed records.
According to one aspect of the present invention we provide a method of appending data to compressed data stored on tape in the form of records wherein compressed data is stored in groups independently of the record structure of the data and each group has an associated data structure containing information relating to the group contents in terms of entities, where an entity can contain more than one record, and means for storing information on the number of records in each entity characterised by locating the entity containing the last record to be retained and changing said stored information to indicate the number of wanted records in that entity and writing the data being appended to a subsequent new entity.
The present invention has the advantage that it obviates the need to decompress records in the entity containing the last record to be retained in order to locate the end of that record. For example, if the relevant entity contains ten records and it is desired to append new data after the sixth of these records, the invention entails changing the "number of records" entry in the entity header from ten to six and writing the new data to tape beginning with a new entity. The last four records in the existing entity will thereafter be ignored, but remain on tape. Note that whenever data is appended to data stored on tape, all of the existing data after the point on tape at which data is appended is lost. The data being appended may be compressed data or uncompressed data.
Several different algorithms are known for compressing data. One approach is to convert the user data to code words using a dictionary which is created dynamically as the data is compressed. The dictionary is recreated, again dynamically, during decompression. An algorithm which adopts this approach is the LEMPEL ZIV WELCH algorithm or the LZW algorithm.
During data compression, a DC drive operating according to the LZW algorithm inserts codewords into the datastream indicative of when a new dictionary is started (the RESET codeword) and when data is flushed i.e. the small amount of data held in a buffer awaiting compression is passed through before further incoming data is sent to the buffer (the FLUSH codeword).
Using the LZW algorithm, to achieve decompression of part of the compressed data on a tape, it is necessary to begin decompressing from a RESET code word in order to be able to recreate the relevant dictionary. Normally, a FLUSH operation is performed prior to beginning a new dictionary so that the new dictionary can start at a convenient point in the data eg. at the beginning of a record.
If the data being appended is compressed according to such an algorithm, it is necessary to start a new dictionary for the data being appended.
In the embodiment to be described, the method comprises storing information on the number of records in each entity in an entity header portion. In that embodiment, information on the cumulative total of the number of records on tape is also stored for each group and the method further comprises changing said stored cumulative information to reflect the number of wanted records in the relevant group.
According to another aspect of the present invention we provide a method of appending data to data stored on tape wherein the data on tape is stored in collections of records and information on the contents of each collection is also stored on tape characterised by locating the collection containing the last record to be retained and changing said stored information to indicate the number of wanted records in that collection and writing the data being appended to a subsequent new collection.
Thus the present invention has application whenever information regarding collections of records is stored on tape and, instead of locating the end of the last record to be retained, it is easier to amend stored information so that the unwanted records in the relevant collection are subsequently ignored.
The data already on tape to which new data is being appended need not necessarily be compressed information, although it is for appending to compressed data that the present invention is envisaged to be most useful.
According to a further aspect of the present invention we provide a storage device which operates in accordance with any of the methods as described above.