1. Field of the Invention
This patent application relates to computer data storage. More particularly, a protocol is disclosed for archiving data in a format that is compatible to addressable (direct access) or sequential membrane media possessing either rewriteable or write once capability.
2. Brief Description of the Relevant Art
The archiving of data from computer assumes importance as a modern safety device. Computer use begets files which are valuable records. Unfortunately, resident computer storage size can easily be exceeded. Furthermore, storage can be damaged or lost. With the damage or loss of storage there can likewise be data loss that is impossible to recreate. It is against this data loss that the regular archiving of computed data forms an indispensable part of even the most ordinary computer operational discipline. Any data loss with a sufficient archive device should be minimized through regular and recurrent use of the archive device.
Prior art archive devices come in discrete formats. These formats can include sequentially accessible devices as well as addressable (direct access) devices. An example of a sequentially accessible device is a magnetic tape having file marks. An example of an addressable device is the formatted floppy disks utilized with many personal computers.
The reader will understand that certain block addressable tapes constitute a confusing situation for distinguishing so-called "direct access" from the serial access devices. Technically speaking, and because of the previously formatted tapes and their respective blocks, a block addressable tape is a direct access device. The media, however, is undeniably serial. Even though the addressing scheme tells the computer exactly where the tape has the data, the tape must serially move to the location of the data.
Such prior art devices also possess either write once or rewriteable capability. As an example, a write once device can include an optical memory. Even though many serially read tapes can be rewritten from the beginning, their serial access capability often renders these devices write once in character as a practical manner. Rewriteable devices are well known and include all sorts of random access disks and memories.
In the prior art, archiving protocols have been designed to the specific media for which the information is directed. For example, in the case of a tape backup, the usual format has included making the archiving process compatible to a serial access, write once, type of device. Regarding such backup tape drives, it has been the usual case in such serial devices to record the directory for the archiving of data in one portion of the tape and to place the data itself in a remaining (and usually trailing) portion of the tape. This being the case, several known disadvantages have followed.
First, the leading portion of the tape usually contains not only identifier and directory information, but it also contains so-called header information. Such header information can be said to be predictive--it tells devices accessing the serial tape about the location and length of the serial data following the header which serial data comprises the body or content of the file.
Such a header is time consuming to access. Further, and in the event that the media containing the header becomes unreadable, the indicated data usually becomes hopelessly lost. Such machines can usually easily be recognized when given a data location task. They start, traverse at high speed to the header, slow and read or write the header, again traverse at high speed to the data, and finally slow and read or write the data. All of this starting and stopping, of course, consumes considerable time.
There are additional problems with these kinds of formats. Due to the inherent inability of sequential devices to exactly predict their eventual capacity, the directory for all files is often written to each unit of media. This can lead to large inefficiencies if, for example, a very large file relative to the media size follows the directory. Because not all intended files can be written to that medium, either the whole directory must be written again on the next tape or such subsequent media contain no catalog, preferring the importance of the first medium over the rest, with corresponding loss of safety. Complete directories written to each unit of media can lead to prohibitive inefficiencies.
Archiving formats of the prior art have presented another serious problem for the computer user. Typically for the archiving of a computer memory, long periods of time are consumed. These long periods of time require dedication of the computer to the archiving function. The computer is useless for other tasks while archiving is in process. Further, if it is necessary to interrupt the archiving process, some and usually all of the archived data is rendered nonrecoverable. This means that when the archiving is resumed, it must be resumed from the beginning as if it had never been started in the first instance.
It will be understood that where long periods of time are spent archiving, interruptions often are not voluntary. The system can crash. The power can be interrupted. The media of the hard disc can be locally invalid. In all of these cases, in most of the prior art protocols, archiving must begin again from the start. No provision is made for the preservation of what has been done.
The reason for the required restarting of the archiving operation can be understood. Such archiving formats usually include a header which header is written first. This header is either at the very beginning of the tape or at least at the very beginning of the discrete data constituting a file to be achieved. This header can usually be said to be predictive. It says (or predicts) that a certain number of blocks or certain distance behind the header contain file content.
Where the archiving process is arbitrarily terminated, data simply is not present where the "predictive" header said data would in fact be present. Further, and after the interruption of the data archiving, a further header is again written. This header is also followed by file data. Only, it will be understood that if the header written after archiving is started again it will typically be located in an area where the previously recorded header of the partially written file said that there would be resident data- Because of this previously recorded header of the partially written file, the last written header of the new file is not accessible. A hopeless mess results with the only solution being either the repeat of the archiving process from the beginning or the abandonment of the directory resident on the tape and manual recovery of the data.
Another technique is to place the directory at the end of the file. Unfortunately, in tapes this can be a most inconvenient technique. First, the end of the tape storage media can be reached before the recordation of the file occurs. In this case the directory must be placed on a second and following tape. For example, it may be necessary to read the entirety first tape, discover that there is no directory, read the second tape only to find out that the desired file remains on the first tape. Access to the archived file is slow.
To avoid this predictor header problem, special so-called "flags" or "file marks" have been recorded on serial media--such as tape drives. These file marks, however, have not been without their own disadvantages.
Where the flags or file marks are utilized in some serial devices, the device scans for the flag typically at a high speed. When the reading device encounters a "flag," the device stops from its high speed traverse and enters into a low speed "read" or "write" of the tape. The stops and starts of a serial tape drive in encountering the reading and responding to flags results in an aggregate intermittent and consequently slow operation of the archived media. This characteristic is particularly aggravated in so-called "streaming" tape drives. In order for such streaming tape drives to start and stop, many mechanical and data operations must be reset. Usually the tape must back up, reset certain data collection parameters and reenter into the streaming mode each time that a stop is called for. In short, the requirement of the intermittent stop for the recognition of flags quickly neutralizes any speed that the streaming tape drive acquired in the first place.
Flags also turn out to be generally unsuitable for addressable devices, a commonly used archiving format where blocks are numerically addressed. In such numerical addressing schemes, it makes no sense to have an incorporated flag.
Where so-called file marks have been used in all sequential devices a further disadvantage has occurred. In order to render the flags readable and to keep the flag information separate and apart from the data, the flag typically occupies one block in the entirety. These recordations of the entire block with a flag are serially very inefficient. This inefficiency can consume large amounts of the serial media just for the flag information. In files requiring many flags, such a protocol is very inefficient.
Because of the above related difficulties and any other vagaries relating to either addressable or serial devices which are placed on write once or rewriteable media, archiving formats heretofore have been anything but uniform. Typically, each archiving format is specifically tailored to the type of device doing the archiving. For example, in a sequentially accessed tape drive, the format is typically written to use filemarks for structure and error recovery, but such format is unapplicable to addressable devices, or at least error recovery will be absent. Formats having equal efficiencies on all types of media, are not now known.
So-called quick file access (QFA) tape back up drives have been used. Most of these devices are hardware specific. Unless the tape drive is manufactured for the QFA format--the format will not work.
In such devices a track on the tape is reserved for the directory. Since the track is reserved, it must be sized on a worse case basis, that is to the case where the files are small and the directories are large. Such tape resident directories are slow.
In cases where block addressable devices have discrete blocks dedicated to the directory function, a further difficulty arises where the directory block becomes full. A second block for a directory is typically "linked" to the first block. Unfortunately, this block is always at some distance--serially along the tape--from the first block. Again, directory access becomes slow.
Further difficulties arise which are a function of modern "user friendly" programs. Such programs, typically pictographic information--that is icons, motion pictures or even sounds--that are voluminous and repeated throughout the files of a directory tree. Many protocols record this information anew with each file. Extensive information becomes needlessly repeated.