When data is written by a computer to a magnetic tape it is a common practice to separate portions of that data with erased sections of tape whose length is of an industry standard amount, such as three tenths of an inch, regardless of the length of tape in between that is required for the data. The empty sections are called gaps; more specifically, inter-record or interblock gaps, depending on the type of data separated by those gaps. Typically, a user's application program reads and writes logical records that are of a size determined by the application program. Because it has been recognized that it is wasteful of both time and tape to read and write individual application records separated by gaps (i.e., by inter-record gaps), it is customary for the operating system to collect consecutive application records into a buffer, from whence they are written by the host computer as entire blocks. These blocks of records are then separated by gaps, which are now called interblock gaps. (Depending on the nature of the storage medium, what corresponds to an interblock gap for half-inch nine track magnetic tape might be some other "record/block separator" such as a special pattern actually encoded on the medium.) The accumulation and writing process is called blocking; the corresponding reverse process of reading blocks from the tape, putting them into a buffer and then retrieving the individual records therefrom is called deblocking.
It is clearly seen that the efficiency of tape utilization is determined by the ratio between the amount of tape (between the gaps) used for data and the amount used for the gaps themselves. Since the amount of tape required for PG,4 each gap is fixed by standardization, high efficiency of utilization requires that the amount of tape with data thereon between the gaps be large in comparison to the size of the gaps. However, it is common for the operating system of the host computer to put limits on the size of the buffer that may be used to perform blocking and deblocking. Furthermore, to change the size of the buffers may require both (1) changes in the application software and (2) changes to the operational configuration of, or in the actual code for, the operating system itself. Those who have attempted such alterations know that they can be an extraordinary aggravation.
Two recent trends in magnetic tape usage further aggravate the situation. The first of these in an increase in the bit density of the tape transport mechanism. A density of 6250 characters per inch (CPI) with a gap size of 3/10 of an inch for nine-track group-coded recording (GCR) tape drives is now an industry standard. Compared to an earlier standard of 1600 CPI and a gap size of 6/10 of an inch for phase encoding (PE) drives, or to an even earlier one of 800 CPI and 6/10 of an inch for nonreturn-to-zero (NRZI) drives, GCR drives require considerably less tape to write a given record or block of records. This means that despite the increased density and an actual increase in the effective capacity of a reel of tape, the increased capacity comes at a price of decreased efficiency unless the degree of blocking can also be increased by a corresponding amount. As noted above, such changes to the environment in which the application program runs can be more trouble than living with the decreased efficiency of tape utilization.
The second trend is data compression. This has a similar effect by making the data itself require less tape. Since the degree of compression can often be quite dramatic, the unintended result of diminishing returns may arise as an increasing percentage of the tape becomes interblock gaps between smaller blocks of compressed data.
The full benefit in increased tape utilization for greater recording densities could be achieved if there were a mechanism for accumulating enough data before it is actually written to the tape. More than simple buffering of blocks into larger blocks is required, however, if such a process is to be invisible to existing operating systems and still write to tape using industry standard tape drives in a way that is in conformance with published standards.
Those objects are achieved by a tape packet assembler/disassembler (TPAD) located in the command and data path between the host computing environment and the tape drive. Host-transmitted records (whether blocked, compressed, both, or neither) are accumulated in a large buffer in the TPAD. Any characters in the data stream that serve as embedded delimiters of the structural features that have been blocked (e.g,. a character whose meaning is end-of-record (EOR)) are left undisturbed, and are treated as ordinary characters. Tape commands, such as Write File Mark, are intercepted and replaced by embedded characters or by other information. Such "other information" can be of various types, and takes the form of tables of linkage information, plus information about the size of those tables. The usual interblock gap that would ordinarily occur on tape between host-transmitted records does not occur, as those records are accumulated into a new unit of tape motion: packets. Interblock gaps will now occur between packets.
Accumulation into the buffer begins at one end thereof and proceeds toward the other end, say, in the direction of increasing addresses. After a selected fractional amount of the buffer has been filled the linkage tables and their size information are appended to their associated data portion in the buffer. The linkage tables and size information is called a trailer. The data portion with its trailer is called a packet. Each packet is large enough to cause the writing of about a foot of tape for a 6250 CPI streaming tape drive. The buffer is large enough to hold, say, eight to twelve packets, depending upon configuration. Packets are written consecutively to the buffer until it is full.
The TPAD can, if needed, split an incoming host-transmitted record into segments stored in consecutive packets. This can happen either because the incoming record is larger than the packets, or because the amount of space remaining in the buffer is insufficient to contain the entire record. It must be remembered that the activities of the TPAD are to be transparent to the host computing environment, and that the TPAD has no way of knowing in advance the size of the next host-transmitted record. The TPAD cannot say to the host, "Hey, I ran out of room in my buffer. Take this record back, and send it to me again when I have more room." Instead, it has to hold the host off to prevent overrunning the buffer, and simply split the incoming record between two consecutive packets.
When the buffer is full all the packets therein are written out to tape, with the freed portions of the buffer available for the assembly and storage of new packets, even as the existing ones are still being written. If the host can keep up, then the TPAD may eventually have to hold the host off; otherwise, the host will fall behind, allowing the buffer to either become empty or simply partially filled. After each packet is written to the tape the tape drive will automatically write the usual interblock gap. If, after a gap is written, the buffer contains less than a packet, no further write operations occur, and tape motion ceases; otherwise, the writing of packets continues. Certain commands issued to the tape drive by the host affect the TPAD's packet assembly activities. A rewind command, for example, flushes to the tape any remaining contents of the buffer as a packet, so that the rewind can occur.
The writing of an entire packet of data between interblock gaps means that the efficiency of tape utilization is high. Writing as many consecutive packets as possible with uninterrupted tape motion assists in obtaining efficient use of streaming tape drives, although it will be appreciated that the practice of the invention is in no way limited to use with streaming drives. It may be used with equal satisfaction in conjunction with start-stop tape drives, and likewise is not limited to use with any particular recording technique, format or density. In particular, it will be appreciated that, even though the description that follows is offered in terms of a streaming 6250 CPI GCR drive with a nonmoving nine-track head, the invention may be used in conjunction with Digital Audio Tape formats (DAT) that employ a moving tape head and a physical record structure of fixed length. As will become apparent, the TPAD can perform its function with only minimal knowledge about the nature of the tape drive, does not need to be physically incorporated into the drive (although that may be desirable for various nontechnical reasons), and can, in general, treat the tape drive as a black box whose inner workings are largely mysterious.
To read the tape the process is essentially reversed. The tape drive is commanded to read a packet lying between two consecutive interblock gaps. No special command is needed to do this, as the tape drive knows nothing of a logical structure called packets. The TPAD simply commands the tape drive to read the next (physical) record on the tape; it just so happens that it is a pretty long record. (It's a packet probably containing an entire collection of blocked application records!) If it appears that there is room in the buffer for another packet then the next one is read, and so on, until the buffer is full. As the host sends commands to (attempt to) read physical records from the tape, the TPAD uses the information in the trailer for the current packet (and then the next packet, and so on) to disassemble the packets and send the original host-transmitted records (whether blocked or not) back to the host.
Special linkage information is maintained in the trailers to facilitate backspacing, forward spacing, and moves to the start or end of the present file, or to some file on either side of the present file. Other types of information can be kept in the trailer, also. The task of automatically compressing and decompressing the data, and the use of tape format specific tape marks to add structural features to the data recorded onto the tape (e.g., volume delimiters), are examples of things that can benefit from storing related information in the trailer.
In a preferred embodiment the TPAD incorporates the ability to compress the user's data before it is assembled into a packet. Data that is read from the tape is also decompressed upon the disassembly of packets. The preferred compression algorithm is one which uses a dictionary that is embedded in the compressed data stream. The preferred algorithm is adaptive, and can restart the process of dictionary building if the effectiveness of compression falls below a certain level. When the compression feature is in use certain additional trailer information is generated during packet assembly and later used during packet disassembly. The nature of this additional information will be separately described after the basic operations for packet assembly and disassembly without compression have been discussed.
Similarly, if the TPAD is to be used in conjunction with a DAT drive certain additional actions are desirable in the type of information maintained in the trailers for the packets.