A namespace is a set of names in which all names are unique. Namespace management is a well-defined method of inserting and deleting names into the namespace (with no duplicate names) and an unambiguous method of tracking names in that namespace. In the computing world, namespace management schemes can be applied to managing multiple files, each with unique filenames.
For disk storage devices, the rules for managing the namespace are clear. The disk namespace is called a disk directory and the elements of a disk namespace are filenames and all of them have to be unique. The operating system provides efficient mechanisms for inserting/deleting files into the disk namespace and for filename searches. The second (and a very important) characteristic of disks is that the namespace (i.e. directory) is maintained on the same storage medium as the elements of the namespace (i.e. filename). The other characteristic of disk storage media is that they are not generally removable from a system and hence each individual disk device has its own unique name in a system.
Tape devices, on the other hand, differ significantly in terms of the characteristics mentioned above for disk devices. Tapes do not have a well-defined system providing a namespace management scheme. Users have to manage the namespace themselves. There are cataloging systems that do provide some form of management but they are limited when compared to disk namespace management systems. Moreover, tapes are removable storage media and hence the possibility of having duplicate copies of the same tapes is a distinct reality.
Tape devices prove to be extremely useful when dealing with audit files (also known as transaction logs) and database recovery. Audit files contain a history of the changes made to an audited database and are a necessary component of the database recovery process. If audit files are not available when a database failure occurs, all changes made to a database since the last backup dump was taken might be lost. Because audit files can contain information for millions of transactions, audit files tend to be large. It is common to store backup audit files on tapedrives rather than waste hard disk on the database system server.
Because of the important role tape devices play in database recovery, and the deficiencies of managing namespaces when dealing with tape devices, the need arises for maintaining and tracking files spanning multiple reels of tape.
The method of the present invention addresses this need by introducing the concept of a Tapeset, or a grouping of audit files on tape. The biggest advantage that can be derived from the method of the present invention is when new audit files are being appended to an existing Tapeset or when audit files are being retrieved. Tapes using the method of the present invention are created under the constraint that all audit files on the tape (which may span multiple reels) must be stored sequentially by audit file number. Consequently every command to append audit files to these tapes needs to be verified for the aforementioned constraint. This is done by verifying whether the previous audit file (by audit file number) for the database exists on some tape. If the constraint is verified, the method of the present invention then positions the tape just beyond the end of that file to prepare for appending the new audit file.
In the prior art, this was done by simply calling an operating system function to open the previous audit file. In other words, if audit file #5 was being appended then the prior art utility invoked an operating system function to open audit file #4. If audit file #4 is found, another operating system function (close with retention) is invoked to position the tape just beyond the end of the file.
Now, in today's data center environments with the multitude of tape drive hardware and the possibility of multiple audit tapes being mounted on that hardware, it took an inordinate amount of time for the operating system to open the requested audit file. This is because the operating system sequentially scanned thru all the mounted audit tapes of the database for the requested file. In many cases, several tape volumes have to be searched sequentially before the right tape volume with the audit file in question is found (Problem #1). This problem is extremely aggravated by the high capacity of today's tapes. Sometimes, scanning thru an entire reel of tape takes more than an hour just to determine whether a particular file is present on the tape (Problem #2). In addition, if the audit file whose presence was being verified (in our case: audit file #4) were huge and worse, if it spanned multiple reels, it would involve another large chunk of elapsed time to be taken for positioning the tape drive at the end of the file (Problem #3).
In the method of the present invention, the concept of the Tapeset is invoked again. If the “Append Command” is issued by the operating system (DBMS software), the current Tapeset number stored in the system control file for the database is used. If the Append Command is issued by the operator, the Tapeset number has to be supplied.
One prior art method to which the method of the present invention generally relates is described in U.S. Pat. No. 5,982,572, entitled Method AND APPARATUS FOR ENABLING FAST ACCESS TO A LOGICAL BLOCK ON A TAPE MEDIUM. The prior art reference discloses a method and apparatus for fast access to any logical block on a media not containing logical block addressing. Categorizing marks such as filemarks or setmarks are provided on the tape medium at various points along the medium. The medium is divided up into a plurality of physical blocks. To permit fast access to a logical block on the tape, a connection table in the form of a block map is provided which establishes a relationship between logical blocks and the tapemarks, and defines physical positions of at least some of the tapemarks.
The method of the present invention differs from the above-mentioned prior art method because by establishing unique names for sets of tape volumes using a mechanism called Tapeset; the Tapeset assigns a number that will indicate the first file in each tape volume; a MAXFILESPERTAPE value is then associated with each file so that locating a required file can be done more efficiently. The prior art method, on the other hand, sets up tapemarks along the tape medium and divides the tape into several partitions without identifying the files stored in these blocks. In order to have fast access using the prior art method, a connection table is provided which establishes the relationship between the partitioned blocks and tapemarks.
Another prior art method to which the method of the present invention generally relates is detailed in U.S. Pat. No. 5,287,232, entitled AUTOMATIC TAPE SEARCHING METHOD. This prior art reference is an automatic (auto) tape searching method that searches the required tape portion fast and automatically by repeating a sequence of operational procedures. The auto tape searching method comprises: a first stage which initializes the port of a microcomputer and the content of a RAM (random access memory), sets the initial values, and controls interrupts at a power-on reset of a video tape recording system; a second stage which checks key input and, for auto random search (ARS) key, performs the auto random search function; a third stage which performs mode checking and, if a present mode needs to accompany the mechanism operation, controls the present mode; and a fourth stage which switches the mode and checks the sensor and an emergency state to put the system in stable condition.
This prior art method is a hardware solution to the quick search problem. Its proposed solution is to automate a searching algorithm that will search the required tape portion; this prior art method utilizes four stages that manipulates RAM content and key inputs, performs mode checking, and evaluates system state conditions. The method of the present invention, on the other hand, is software based but achieves the same goal utilizing a nomenclature system that efficiently locates a specific data file.
Still another prior art clustering system to which the method of the present invention generally relates is detailed in U.S. Pat. No. 5,757,571, entitled FLEXIBLE-CAPACITY SCALING FOR EFFICIENT ACCESS OF ORDERED DATA STORED ON MAGNETIC TAPE MEDIA. This prior art method discloses various data storage formats that help to efficiently locate, read, and write user data stored on magnetic tape media. A tape is formatted by writing multiple segment-headers, free from any interleaved access of user data. Adjacent segment-headers are spaced by a predetermined interval to define multiple data storage segments. Segment-headers all contain a unique key, which is copied into a key index to identify valid segments. After formatting, normal tape accesses can be performed. Without erasing any old headers or data, a new formatting scheme can be established by writing new segment-headers on the tape. The new segment-headers include a new unique key, replacing the previous key in the key index. Previous segment-headers stored on the tape are ignored, since they lack the updated key. Segments may be selectively grouped to provide independently addressable partitions. Mapping between segments and partitions can use a fixed relationship (e.g. one-to-one), or each partition may be variably sized according to the amount of data to be stored therein. Variable-sized partitions may be automatically padded with a selected number of empty segments. Another feature is flexible-capacity scaling, which distributes an ordered set of device blocks on a multi-track magnetic tape medium. The device blocks are bi-directionally stored in a continuous configuration of multiple adjacent stacked serpentine patterns occupying some or all of the tape. This configuration permits sequential access of all device blocks without advancing the tape medium to skip over regions between adjacent device blocks.
The key difference between the prior art method and the method of the present invention is the organization of file-positioning information provided by the method of the present invention. The method of the present invention uses a disk directory file to access files, identify files, and resolve name conflicts when multiple tapes are created with the same name. The disk directory file contains the name of the first file in each volume; a serial number is also associated with each volume. Together these two identifiers are used to limit the search through tape drives.
Still another prior art clustering system to which the method of the present invention generally relates is detailed in U.S. Pat. No. 5,572,378, entitled DIRECT FILE ACCESS SYSTEM FOR MAGNETIC TAPE. This prior art method discloses a direct file access system for a magnetic tape where all data files begin at a designated location on the tape. The direct file access system may be used with a reduced rewind data configuration to decrease data access time. The reduced rewind data configuration divides data files into generally equal portions so that data files begin and end at a designated location on the tape, eliminating rewind sequences. A method and system for reducing the number of tape retensioning passes is included to further decrease access time. This prior art method divides data into equal portions so that all data files start and end on designated locations on the tape. A reduced rewind scheme is then applied to zip through these data blocks to find data. This method, however, uses a linear search that must run through an entire block to locate a specific data file. The method of the present invention, however, assigns numeric values to each file in a data block, reducing the search time from O(n) to O(1). O(n) is a notation for specifying that the performance of an algorithm depends on the length or number of inputs “n”. O(1) signifies that regardless of “n” the algorithm performs in time similar to when n=1.
Yet another prior art clustering system to which the method of the present invention generally relates is detailed in U.S. Pat. No. 6,081,875, entitled APPARATUS AND METHOD FOR BACKUP OF A DISK STORAGE SYSTEM. This prior art reference is a backup system and method providing for the creation of a reconciled snapshot backup image of a database while the database, residing on a disk array system, is in use by users. A backup computer running a commercial backup utility is connected between the array system and a tape storage system. While the backup is underway, write requests to the database are suspended until the data currently in those data blocks is copied and stored in an original data cache. Here, the disk system address of the copied block and a pointer to the location of the block in the cache are stored in a map. The backup utility incrementally reads portions of the database from the disk system and forwards those portions to the tape system. Prior to each portion being forwarded to the tape system, all data blocks in the portion which have an address that corresponds to the address of a block in the cache are discarded and replaced with the data from the cache for that address.
This prior art reference is a method to backup data on a disk storage system. It does not deal with the problem addressed by the method of the present invention, that is to say, providing for the efficient locating of specific data files within a storage system.