1. Field of the Invention
The present invention relates to techniques for digitally storing data. More particularly, the invention concerns a particularly efficient storage system for making redundant copies of data on tape by waiting until a rewind/unload command is received and then copying stored data objects en masse, thus benefitting from any data compression used to store the objects along with the lower overhead for the bulk copy operation.
2. Description of the Related Art
Many data processing systems require a large amount of data storage space, for use in efficiently accessing, modifying, and re-storing data. Data storage is typically separated into several different levels, each level exhibiting a different data access time or data storage cost. A first, or highest level of data storage involves electronic memory, usually dynamic or static random access memory (DRAM or SRAM). Electronic memories take the form of semiconductor integrated circuits where millions of bytes of data can be stored on each circuit, with access to such bytes of data measured in nanoseconds. The electronic memory provides the fastest access to data since access is entirely electronic.
A second level of data storage usually involves direct access storage devices (DASD). DASD storage, for example, includes magnetic and/or optical disks. Data bits are stored as micrometer-sized magnetically or optically altered spots on a disk surface, representing the xe2x80x9conesxe2x80x9d and xe2x80x9czerosxe2x80x9d that comprise the binary value of the data bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic material. The disks are rotatably mounted within a protected environment. Each disk is divided into many concentric tracks, or closely spaced circles. The data is stored serially, bit by bit, along each track. An access mechanism, known as a head disk assembly (HDA) typically includes one or more read/write heads, and is provided in each DASD for moving across the tracks to transfer the data to and from the surface of the disks as the disks are rotated past the read/write heads. DASDs can store gigabytes of data, and the access to such data is typically measured in milliseconds (orders of magnitudes slower than electronic memory). Access to data stored on DASD is slower than electronic memory due to the need to physically position the disk and HDA to the desired data storage location.
A third or lower level of data storage includes tapes, tape libraries, and optical disk libraries. Access to library data is much slower than electronic or DASD storage because a robot is necessary to select and load the needed data storage medium. An advantage of these storage systems is the reduced cost for very large data storage capabilities, on the order of Terabytes of data. Tape storage is often used for backup purposes. That is, data stored at the higher levels of data storage hierarchy is reproduced for safe keeping on magnetic tape. Access to data stored on tape and/or in a library is presently on the order of seconds.
Having a backup data copy is mandatory for many businesses for which data loss would be catastrophic. The time required to recover lost data is also an important recovery consideration. In this respect, tape or library backup has been found useful for periodically backing up primary data by making a copy of the primary data. In contrast to the serial access of tape storage, DASD backup storage is random access and therefore facilitates more advanced backup techniques. One example is xe2x80x9cdual copy,xe2x80x9d which operates by mirroring contents of a primary DASD device with a nearly identical secondary DASD device. An example of dual copy involves providing additional DASDs so that data is written to the additional DASDs substantially in real time along with the primary DASDs. Then, if the primary DASDs fail, the secondary DASDs can be used to provide otherwise lost data. A drawback to this approach is that the number of required DASDs is doubled.
A different data backup alternative that avoids the need to provide double the storage devices involves writing data to a redundant array of inexpensive devices (RAID). In this configuration, the data is apportioned among many DASDs. If a single DASD fails, then the lost data can be recovered by applying error correction procedures to the remaining data. Several different RAID configurations are available.
A number of other solutions have been developed for DASD backups, including synchronous remote copy, asynchronous remote dual copy, peer-to-peer remote copy, and other solutions, many having been developed by International Business Machines Corp. (IBM).
DASD backup solutions have been more thoroughly developed than tape backup solutions. Due to ability to randomly access DASD media, DASD offers more reliable storage, faster commit times, and more flexibility in crafting backup strategies. Still, tape backup remains popular because it is cost effective. According to one approach, when a system receives new data for storage, it concurrently copies the data to separate tapes using tape drives operating in parallel. Under still another approach, when the system receives new data for storage, it first copies the data using tape drive, and then begins to copy the data from the first tape drive to another tape at a second tape drive.
Although known tape backup strategies enjoy widespread use today, engineers at IBM are continually seeking to improve the performance and efficiency of tape backup systems. One area of possible focus is reducing the time required to complete tape backups.
Broadly, the present invention concerns a particularly efficient storage system for making redundant copies of data on tape media. The invention is implemented in a tape storage system that includes a tape director coupled to one or more tape drives, the entire tape storage system being coupled to a host. In response to a write command, the invention automatically makes redundant copies of the data, this process being invisible to the host.
Initially, the host sends the tape director one or more data objects, such as logical xe2x80x9cvolumesxe2x80x9d of data. The host also sends corresponding instructions to write the data objects to tape media. In response, the tape director stores the data objects using a first tape drive, and then transmits a message to the host signaling successful completion of the storage. The tape director waits until receiving a rewind/unload command from the host to create a backup copy of the data objects. At this time, the tape director creates the backup copy by copying all data from the first tape drive to a second tape drive, regardless of the data objects"" boundaries. This copy operation, called xe2x80x9cbulk copy,xe2x80x9d is more efficient than the data objects"" original storage, because there is less overhead in copying the data objects en masse, and there is actually less data to copy if the data objects were stored with any data compression. After copying the data objects, the tape director completes the host request by instructing the tape drive to rewind and unload the tape containing the original data objects.
The invention also contemplates a xe2x80x9cdual tapexe2x80x9d modification to this technique, where the tape directory stores some data objects using one tape drive, while storing other data objects using another tape drive. When the host sends the tape director its rewind/unload command, the tape director responds by directing the tape drives to exchange their respective data objects, thereby creating the backup copy. Namely, the tape director operates the tape drives to all data objects from the backup set of tapes to the primary set of tapes, and to copy all data objects from the primary set of tapes to the backup set of tapes. After copying these data objects, the tape director commands the tape drives to rewind the primary and backup tapes and unload the tapes from their respective tape drives.
In one embodiment, the invention may be implemented to provide a method to create redundant data copies on tape media. In another embodiment, the invention may be implemented to provide an apparatus, such as a tape storage system, programmed or otherwise configured to create redundant copies of data on tape by waiting until a rewind/unload command is received and then copying stored data objects en masse, thus taking advantage of any data compression used to store the objects and also the lower overhead for the copy operation. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to direct components of a tape storage system to create redundant data copies on tape media as mentioned above.
The invention affords its users with a number of distinct advantages. First, in contrast to previous techniques, the tape backup process of this invention finishes more quickly. This is because the tape director delays creation of the backup copy until the host issues a xe2x80x9crewind-and-unloadxe2x80x9d command. This delayed backup completes more quickly because it takes advantage of any data compression used to store the objects, and also benefits from the bulk copy operation""s low overhead. Overhead is low because the bulk copy operation involves transferring a large block of information, rather than many smaller data objects as such.
Furthermore, the present invention does not incur any additional delay that is detectable by the host. This is because most conventional backup systems do not commit a backup copy until the rewind-and-unload operation succeeds, even though the conventional backup has been performed previously. The invention may be implemented to gain further advantage by dividing the initial storage of data objects among paired tape drives, and then cooperatively operating the tape drives to exchange data upon the rewind/unload command. The tape drives, operating in parallel, operate twice as fast. As another advantage, the invention permits earlier recovery than prior tape backup techniques; the data is available upon completion of the first copy, even if the backup copy has not been made. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.