The present invention relates to a method of indexing data on a sequential data storage medium for storing and locating data files, and a storage medium such as magnetic and optical tape, and the like incorporating the indexing data.
The conventional method of storing encoded data files, such as those generated by a computer, on a tape medium is to write a filemark at a point on the tape at which the data file is to be stored, write a label block containing the name of the data file or other data by which the file may be identified, store the label block immediately following the filemark and then write the data immediately following the label block. In some instances, the data file archiving process involves bundling a number of files and storing the bundled files as a single file on the tape. While this process simplifies the process of storing the files on the tape, it complicates the process of retrieving individual files from the tape when required. Frequently, the entire bundle must be retrieved to operating memory of a computer in order to extract a desired file. It would be desirable to be able to retrieve a single file without these complications.
When files are stored individually, the conventional method of retrieving an individual file is to sequentially read the label block of each of the files on the tape until the desired file is located. Once located, the file is retrieved. This process is satisfactory when the number of files is relatively small, such as a few hundred or thousand files. However, the process is extremely slow and tedious when the number of files is in the order of millions. Optical tape capable of storing almost 1,000 GigaBytes of data representing potentially millions of files is now commercially available. A single tape can replace an entire tape library consisting of thousands of tapes and eliminate costly and time-consuming manual loading, unloading and tape maintenance. Clearly, retrieval of these files can be extremely time consuming with conventional file retrieval methods.
Indexing methods for sequential recording media such as magnetic or optical tape frequently involves dedicating a portion of tape for index storage separate from the data storage. Each time a file is added to the tape the tape must be reversed to the index portion to add the new index data and then back to the data storage portion to add the next file. This head movement back and forth through the tape slows the data storage system significantly. Even the above noted methods of scanning a tape for a particular file are highly dependent on tape length and speed of the reader. When a tape requires 90 seconds to travel end to end, a randomly selected file will average 45 seconds to retrieve excluding processing time for processing each file block.
Other indexing methods store index data in memory, such as the computer hard drive. Only when the tape is substantially filled is a complete index written on the tape. It is desired to have index information resident in an operating memory, such as the hard drive, RAM or other memory than the tape, for search functions when files are to be retrieved from a tape. However, for archiving purposes, the index information is not secure while stored in operating memory. A catastrophic failure such as a hard disk crash could result in the loss of all indexing data. Recreating the index is time consuming requiring reading each of the file identifiers on the tape and reassembling the indexing data. For numerous and large tapes, the risk of losing indexing data is undesirable. In addition, archiving processes involve the compilation of numerous tapes simultaneously. Completed tapes may include indices stored on tape. However, the partially completed tapes generally do not. As a result, numerous indices may be stored in memory at once greatly increasing the consequences of a data loss. Changing partially filled tapes also presents similar difficulties, since an index may not be available in the operating memory and must be compiled from the partial tape.
These prior art practices of xe2x80x9cguess workxe2x80x9d retrieval, or a file indexing system based on magnetic disk techniques are incompatible with archival needs and greatly complicate the handling and distribution of numerous tapes for extensive tape libraries. Conventionally, data stored on tape media is not modified. Modified files are merely written to the end of the tape, further complicating the indexing and retrieval process. That said, some tape media are rewritable. It would be preferable for any indexing method to accommodate both rewriting of data files on rewritable media and replacing files on write once media.
Accordingly, there is a need for a simpler, faster and more efficient method of storing and retrieving files stored on large tapes. In addition it is desired to store cumulative index information on the tape as data is added so that the loss of an index in memory can be quickly and easily restored from index data on the tape, without reading every file.
The present invention provides a novel tape file formatting structure and a method for storing and retrieving individual files from potentially millions stored on a single tape. In accordance with the present invention, index data is stored on the tape alongside the files as they are written so that each tape remains fully indexed all the time and for all time. All tapes are self-contained and do not require additional magnetic disk storage. Even with several million files on a tape, a single file can be quickly and easily retrieved. The present invention may be supplied as an easy to use application program which allows the user to write files to tape, read them back, and perform many other functions or as a subroutine package which allows the user to incorporate the present invention into the user""s own software.
One aspect of the present invention provides a method of retrievably storing indexed data on a sequential data storage medium, the data indexed according to a predetermined order, comprising the steps of:
storing a data file on the sequential storage medium;
identifying the presence of a last written index portion including reference to locations of other previous index portions;
determining an index position for index data relating to the data file within the identified last written index portion according to the predetermined order;
for any of the identified previous index portions, determining if the addition of the index data relating to the data file results in a change to any of the index portions or if the index portions are unchanged;
storing an index on the sequential storage medium including:
references to the location of any index portions that are determined to remain unchanged; and
index portions that are changed to include the index data relating to the data file according to the predetermined order.
A further method of retrievably storing encoded data files on a sequential storage medium, in accordance with the present invention comprises the steps of:
a) locating a last file stored on the medium;
b) duplicating first indexing data of the last file stored on the medium;
c) compiling second indexing data for a new data file having a primary key and a start position of the new file on the medium;
d) modifying the duplicate of the first indexing data to include the second indexing data of the new data file;
e) inserting the modified duplicate of the indexing data in the new data file; and
f) storing the new data file sequentially on the medium following the last file stored.
A still further preferred method, in accordance with the present invention, comprises a method for locating a data file stored on a sequential recording medium having a specific entry key, absent an index for the data file stored in operating memory said method comprising steps of:
(a) locating a last file written on the medium;
(b) reading an indexing data field of the last file, said indexing data field having at least one block ordered in a hierarchy of block levels;
(c) extracting a lowest level block and finding an indication of an immediately higher level block having information relating to the data file by comparison of the specific entry key of the data file with information contained in the lowest level block;
(d) moving to a position on the medium of a file having the immediately higher level block;
(e) extracting from the immediately higher level block an indication of a further immediately higher level block having information relating to the data file by comparison the specific entry key of the data file with information contained in the immediately higher level block;
(f) repeating the steps (d) and (e) until a final leaf level block is extracted;
(g) moving to a file position having the data file indicated by information contained in the final leaf level block
Another aspect of the present invention provides a sequential recording medium for retrievably storing a plurality of files, each file comprising:
a start field for marking the start of a file;
an index field for storing indexing data respecting the associated file, and for storing a sequence of one or more positions of other start fields respecting other files leading in a reverse direction on from a current position on the medium toward the beginning of said medium; and,
a data field for storage data which forms said data file,
wherein the index field identifies an ordered range of files stored on the medium, and locations of previous index files on the medium containing additional portions of the ordered range.
A further aspect of the present invention provides an apparatus for retrieving a data file stored on a sequential recording medium, absent an index for the data file stored in operating memory, the apparatus comprising:
a data reader for reading data from a current location on the storage medium;
a mechanism for varying the current location;
a processor for controlling the mechanism and the data reader and for receiving data from the data reader, the processor comprising:
means for locating a current index field adjacent a current data file stored on the medium and for controlling the data reader to read the current index field and provide the read data to the processor,
means for traversing a tree branch determined from the provided current index field, the tree branch having at least an additional index field forming a leaf level storing a location of the file on the storage medium, the additional index field stored at another location on the storage medium adjacent another data file, and means for controlling the mechanism to move the storage medium to the location of the file and for controlling the data reader to retrieve the file therefrom.
Advantageously the present invention permits indexing data to be stored on a sequential recording medium quickly and efficiently, while providing a method for rapidly locating files or constructing an index in the absence of an index stored in operating memory.