The present invention relates to tape-based data storage, and more particularly, to storing data on a magnetic tape and storing an index in a nonvolatile memory associated with the magnetic tape.
Data storage drives, such as data tape drives, record information to and read information from media, such as the data tape of a tape cartridge. Data storage drives are often used in conjunction with, for example, a data storage and retrieval system. One example of such a system is an automated data storage library with robotic picking devices, wherein removable media cartridges are selectively transported between storage cells and data storage drives in an automated environment. Herein, automated data storage library, data storage library, tape library system, data storage and retrieval system, and library may all be used interchangeably.
A digital storage tape may contain multiple files. Files and data stored on tape are written to the tape sequentially, in a linear fashion. Unlike hard drives or solid state nonvolatile storage such as flash memory or other nonvolatile memory (NVM), tape does not allow direct-access write of data. In general, tape data can only be written linearly, in append-only mode. For example, the Linear Tape-Open (LTO) standard uses shingling to write tracks to increase tracks density. However, due to shingling, the in-place rewrite of a file or a data block stored in one track would destroy what has been written in the neighboring track.
File management of data on tapes has traditionally been different from that of direct-access storage media. In the latter, file system data structures are commonly used, keeping information such as a hierarchical directory structure, file names, file attributes (e.g., such as size, access information, access rights permissions), and a list of the physical storage blocks containing the file contents, etc. However, since such file system structures must be updated with information when any changes are made to files stored on the media, such file system structures are not well-suited to tapes, which do not allow rewrite of the file system information. While tape-based file system implementations do exist, however, as reading the file system information requires positioning the tape to the end of the recorded data, and any update requires rewriting of a new copy of the entire set of file system structures at the end of the tape data.
One common approach to managing data on tape requires a storage system to manage the tape while storing a separate index of the tape content on an unrelated disk device or other remote direct-access storage media. For example, tape is no longer self-describing. Data stored on the tape cannot be accessed because the tape file index is left in the storage system's database once the tape is taken out of the scope of the storage system. The longevity of the data is limited by the longevity of the storage system, including all its software, databases and hardware it is running on. Hence, while the tape media may preserve the bits intact for years, there is no guarantee that the files will survive as long since their data may no longer be interpretable.
Another approach to storing files on tapes is via utilities such as TAR (Tape ARchive). The TAR program combines a set of source files into a single data set which is written to tape. The TAR file consists of a header, which describes the TAR file contents and retains file metadata, and the body of the TAR file which consists of the source files concatenated together. The TAR program makes the tapes self describing which avoids the dependency on an external index. However, TAR files are not appendable once written. An appended tape therefore may consist of several TAR files. Indexing such a tape will require multiple seeks and reads. Also there is the risk of data loss if a TAR file header is corrupted or its format becomes obsolete or its header and content storage format are found incompatible by the TAR utility attempting to open it, e.g., there are multiple variations of TAR which are not fully compatible with each other. Since the source files are concatenated in the data area, the TAR file header is required to determine the source file boundaries.
Very large TAR files are often challenging to handle during transfer on a network between disk and tape systems. In some practices, the large TAR file is first divided into blocks of certain size, such as 32GB, and the blocks are transferred and written to tape in sequential order. To restore a file from such a tape, all of the blocks have to be read from tape, the complete TAR file has to be reassembled, and only then may the file be accessed by the TAR utility. This process involves one or more copy operations of the entire TAR file, requires a large temporary storage area for TAR assembly.