When data is written by a computer to a magnetic tape it is a common practice to separate portions of that data with erased sections of tape whose length is of an industry standard amount, such as three tenths of an inch, regardless of the length of tape in between that is required for the data. The empty sections are called gaps; more specifically, inter-record or interblock gaps, depending on the type of data separated by those gaps. It is customary for the operating system to collect consecutive application records into a buffer, from whence they are written by the host computer as entire blocks. These blocks of records are then separated by gaps, which are now called interblock gaps. The accumulation and writing process is called blocking; the corresponding reverse process of reading blocks from the tape, putting them into a buffer and then retrieving the individual records therefrom is called deblocking.
It is clearly seen that the efficiency of tape utilization is determined by the ratio between the amount of tape (between the gaps) used for data and the amount used for the gaps themselves. Since the amount of tape required for each gap is fixed by standardization, high efficiency of utilization requires that the amount of tape with data thereon between the gaps be large in comparison to the size of the gaps. More than simple buffering of blocks into larger blocks is required, however, if such a process is to be invisible to existing operating systems and still write to tape using industry standard tape drives in a way that is in conformance with published standards.
Such transparent "super blocking" is achieved by a tape packet assembler/disassembler (TPAD) located in the command and data path between the host computing environment and the tape drive. Host-transmitted records (whether blocked, compressed, both, or neither) are accumulated in a large buffer in the TPAD. Any characters in the data stream that serve as embedded delimiters of the structural features that have been blocked (e.g., a character whose meaning is end-of-record (EOR)) are left undisturbed, and are treated as ordinary characters. Tape commands, such as Write File Mark, are intercepted and replaced by embedded characters or by other information. Such "other information" can be of various types, and takes the form of tables of linkage information, plus information about the size of those tables. The usual interblock gap that would ordinarily occur on tape between host-transmitted records does not occur, as those records are accumulated into a new unit of tape motion: packets. Interblock gaps will now occur between packets.
Accumulation into the buffer begins at one end thereof and proceeds toward the other end, say, in the direction of increasing addresses. After a selected fractional amount of the buffer has been filled the linkage tables and their size information are appended to their associated data portion in the buffer. The linkage tables and size information is called a trailer. The data portion with its trailer is called a packet. Each packet is large enough to cause the writing of about a foot of tape for a 6250 CPI streaming tape drive. The buffer is large enough to hold, say, eight to twelve packets, depending upon configuration. Packets are written consecutively to the buffer until it is full.
The TPAD can, if needed, split an incoming host-transmitted record into segments stored in consecutive packets. This can happen either because the incoming record is larger than the packets, or because the amount of space remaining in the current packet is insufficient to contain the entire record.
When the buffer is full all the packets therein are written out to tape, with the freed portions of the buffer available for the assembly and storage of new packets, even as the existing ones are still being written. If the host can keep up, then the TPAD may eventually have to hold the host off; otherwise, the host will fall behind, allowing the buffer to either become empty or simply partially filled. After each packet is written to the tape the tape drive will automatically write the usual interblock gap. If, after a gap is written, the buffer contains less than a packet, no further write operations occur, and tape motion ceases; otherwise, the writing of packets continues. Certain commands issued to the tape drive by the host affect the TPAD's packet assembly activities. A rewind command, for example, flushes to the tape any remaining contents of the buffer as a packet, so that the rewind can occur.
The writing of an entire packet of data between interblock gaps means that the efficiency of tape utilization is high. Writing as many consecutive packets as possible with uninterrupted tape motion assists in obtaining efficient use of streaming tape drives, although it will be appreciated that TPAD's are in no way limited to use with just streaming drives. They may be used with equal satisfaction in conjunction with start-stop tape drives, and likewise are not limited to use with any particular recording technique, format or density. In particular, TPAD's may also be used in conjunction with Digital Audio Tape formats (DAT) that employ a moving tape head and a physical record structure of fixed length. As will become apparent, a TPAD can perform its function with only minimal knowledge about the nature of the tape drive, does not need to be physically incorporated into the drive and can, in general, treat the tape drive as a black box whose inner workings are largely mysterious.
To read the tape the process is essentially reversed. The tape drive is commanded to read a packet lying between two consecutive interblock gaps. No special command is needed to do this, as the tape drive knows nothing of a logical structure called packets. The TPAD simply commands the tape drive to read the next (physical) record on the tape; it just so happens that it is a pretty long record. (It's a packet probably containing an entire collection of blocked application records!) If it appears that there is room in the buffer for another packet then the next one is read, and so on, until the buffer is full. As the host sends commands to (attempt to) read physical records from the tape, the TPAD uses the information in the trailer for the current packet (and then the next packet, and so on) to disassemble the packets and send the original host-transmitted records (whether blocked or not) back to the host.
The recovered application records are sent back to the host individually, each in response to a command from the host (intended for the tape drive) to read the next record. Just as the tape drive is unaware of the notion of a packet, so is the activity in the host computer. On the one hand the application program and operating system issue commands at the record level that supposedly cause corresponding tape motion, while on the other the TPAD intercepts these and causes tape motion at the packet level. The fundamental unit of tape motion has now shifted in its level of abstraction, as it relates to the activities the user believes he is causing and controlling. As long as all equipment functions as it should, this shift in the fundamental unit of tape motion is invisible to both the user and the operating system. But when certain types of hardware failures are considered a potential "gottcha" emerges.
Suppose the user has previously written, say, one hundred records to the tape. At a later time he decides that he wants to overwrite those, beginning at record eighty-six. In accordance with accepted and recommended practices, he is prepared to abandon any claim to the last fifteen records on the tape and, makes no attempt to simply replace an intervening record while expecting those after it to remain intact. Accordingly, he rewinds the tape and then commands the tape drive to forward space eighty-five records. (It really doesn't matter how he gets to the start of record eighty-six; any valid combination of commands could be employed.) In itself, this causes no problem; the TPAD knows how to do this. Let's say that the eighty-sixth record is about midway through the nth packet, and that the nth packet contains records seventy-five through ninety-five.
In the situation described the TPAD would move the tape to the start of the nth packet, disassemble that packet, and for the benefit of the host, emulates any appropriate tape drive activity corresponding to the forward spacing over records seventy-five through eighty-five. The user now commands the tape drive to begin to write. Whether the user writes only a new end-of-file mark and then rewinds, or writes only one new record followed by a rewind, or writes one hundred new records causing the refilling of the nth packet and the construction of the next several packets following, the nth packet, in its (new) entirety, must be rewritten to the tape. But suppose the tape drive has failed and does not properly write to the tape?. Then records seventy-five through eighty-five are lost also (destroyed in the defective rewriting of the nth packet), and not just those the user thought he was overwriting. And while it is not possible to protect the user completely from all hardware failures, it can with fairness be argued that: (1) the user gave no permission for anybody to write on the tape at the location of records seventy-five through eighty-five; and (2) the TPAD, if it is to be truly transparent, should conduct its affairs in a way that never compromises the integrity of any data that would be safe if the TPAD and its invisible packets were not in use. After all, it is one thing for record eighty-six and those thereafter to be destroyed; the user did give permission for them to be written upon, and if that fails, the read-after-write feature of the drive can tell him immediately if things have gone haywire. He can at least expect with reasonable certainty that the data up through and including record eighty-five is still good. He knows still what it was that he was going to write, so he still has all his data, albeit somewhat fragmented. It is quite another thing to discover that data he gave no permission to change has been corrupted, and that as the mere user, he has no way of finding out that it has happened or where the corruption might begin.
So, the problem is this: how can the TPAD avoid the corruption of innocent data during the reassembly of an existing packet?