1. Field of Invention
This invention relates to systems and methods for storing information.
2. Description of Related Art
Tape storage is often used as an inexpensive backup for on-line storage, increasing the reliability of computer-stored data by providing a redundant storage location. Additionally, hierarchical storage management (HSM) systems use tape storage to greatly expand the capacity of a fixed disk-based file system. Files are migrated from the disk-resident file system to tape storage when the disk-resident file system runs out of space, and files are migrated from tape to fixed disk when they are referenced. Most files in an HSM system are stored only on tape, and no redundant copy is stored on disk.
One method of storing backup information in an HSM system is to store two copies of the information, i.e., data mirroring. This way, stored information can be reconstructed even if a primary and one backup information source are damaged or lost.
Another method for storing backup information is the Redundant Arrays of Inexpensive Tapes (RAIT) technology. In a RAIT system, a collection of N+1 tapes are aggregated to act as a single virtual tape. In a typical implementation, data files are simultaneously written to blocks of the N tapes in stripes, and the parity information, which is the bit-wise exclusive-OR of data written in the blocks, is stored on the additional tape drive simultaneous with storing the data files on the N tapes. The RAIT system has a higher performance compared to systems that store duplicate copies of information because writing data in parallel to multiple tapes results in high speed storage. However, because data is stored in stripes across multiple tapes in the RAIT system, all of the tapes in a RAIT stripe, i.e., a group of tapes storing a particular set of data, must be mounted and read synchronously to reconstruct a file stored on the tapes. Because data must be synchronously read from tapes in the RAIT stripe, special hardware, or software emulation, for reading the tapes is typically required, and if one of the tape drives is not operating properly, data cannot be properly read from any of the tapes. That is, the system must wait until all of the tapes and associated tape drives are operating properly before any data can be read from the tapes.
The invention provides a system and method for storing information using a plurality of storage media, such as magnetic tapes, that can be used as part of an HSM system. According to at least one aspect of the invention, storage media that store data files and related parity information are written to asynchronously. That is, data files can be stored in a group of storage media synchronously in stripes similar to that in RAIT, or asynchronously unlike RAIT, but parity data is stored asynchronously with respect to storage of the data files. Thus, data files and related parity data are stored independently of each other.
Protection groups are preferably formed for the storage media, or regions of the storage media, to organize how data is stored on the storage media and how parity information is generated. For example, a protection group can be a collection of N regions from N storage media, one region per storage medium, and parity information is generated and stored for data in each protection group. Parity information is stored so that if one storage medium in a protection group is lost or damaged, data stored on the lost or damaged storage medium can be reconstructed from the remaining storage media and the parity information in the protection group. Preferably, parity information is determined as the exclusive-OR of data in a protection group, but other methods for generating parity information are possible. When a protection group is created, each region in the group is empty. As data is written to a region of a storage medium, the region and the corresponding protection group become filled, and parity information is generated and stored in active memory for the protection group. When the regions in a protection group are completely filled and closed, the protection group is closed and parity information stored in active memory for the protection group can be migrated to more permanent backup storage. Thus, parity data for a protection group can be stored asynchronously with respect to storage of data files for which the parity data is generated.
When a data file is received for storage, a storage medium, or region of a storage medium, can be selected to store the data file. Selection of the storage medium or region can be done in many different ways, including using a xe2x80x9cround-robinxe2x80x9d allocation scheme or by selecting a storage medium that has the largest number of open regions. Once a storage medium or region is selected, the data file is stored and parity data related to the data file is generated. Parity information can be generated before, during or after the data file is stored.
Since in accordance with one aspect of the invention, data files can be stored within a single region or storage medium, or in a relatively small number of regions or storage media, a file can be restored by accessing a single or relatively small number of storage media. This is in contrast to RAIT storage systems, which store single files in stripes across multiple tapes. In addition, since data files can be stored in an asynchronous fashion with respect to each other, data files can be read from appropriate storage media using commonly available equipment, unlike RAIT storage systems that require multiple tapes be synchronously accessed to restore a data file. That is, according to the invention data can be stored asynchronously, and even in parallel, to multiple storage media. Although data files can be written asynchronously according to the invention, the invention is not limited to asynchronous data file storage. That is, data files can be stored in a stripe across two or more storage media similar to RAIT systems.
Various different storage media management strategies can be used to achieve different goals, such as minimizing parity information active memory storage overhead, minimizing the number of open storage media, minimizing data recovery or reconstruction time, etc. To achieve these goals, adjustments to region size, protection group forming policy and/or parity information storage policy can be made.
These and other aspects of the invention will be appreciated and/or are obvious in view of the following description of the invention.