1, Field of Invention
Aspects of the present invention relate to data storage, and more particularly to apparatus and methods for emulating a tape storage system to provide the equivalent of full back-ups using an existing full back-up and subsequent incremental back-ups and enabling end-users to restore data from such back-ups.
2. Discussion of Related Art
Many computer systems include one or more host computers and one or more data storage systems that store data used by the host computers. These host computers and storage systems are typically networked together using a network such as a Fibre Channel network, an Ethernet network, or another type of communication network. Fibre Channel is a standard that combines the speed of channel-based transmission schemes and the flexibility of network-based transmission schemes and allows multiple initiators to communicate with multiple targets over a network, where the initiator and the target may be any device coupled to the network. Fibre Channel is typically implemented using a fast transmission media such as optical fiber cables, and is thus a popular choice for storage system networks where large amounts of data are transferred.
An example of a typical networked computing environment including several host computers and back-up storage systems is shown in FIG. 1. One or more application servers 102 are coupled via a local area network (LAN) 103 to a plurality of user computers 104. Both the application servers 102 and the user computers 104 may be considered “host computers.” The application servers 102 are coupled to one or more primary storage devices 106 via a storage area network (SAN) 108. The primary storage devices 106 may be, for example, disk arrays such as are available from companies like EMC Corporation, IBM Corporation and others. Alternatively, a bus (not shown) or other network link may provide an interconnect between the application servers and the primary storage system 106. The bus and/or Fibre Channel network connection may operate using a protocol, such as the Small Component System Interconnect (SCSI) protocol, which dictates a format of packets transferred between the host computers (e.g., the application servers 102) and the storage system(s) 106.
It is to be appreciated that the networked computing environment illustrated in FIG. 1 is typical of a large system as may be used by, for example, a large financial institution or large corporation. It is to be understood that many networked computing environments need not include all the elements illustrated in FIG. 1. For example, a smaller networked computing environment may simply include host computers connected directly, or via a LAN, to a storage system. In addition, although FIG. 1 illustrates separate user computers 104, application servers 102 and media servers 114, these functions may be combined into one or more computers.
In addition to primary storage devices 106, many networked computer environments include at least one secondary or back-up storage system 110. The back-up storage system 110 may typically be a tape library, although other large capacity, reliable secondary storage systems may be used. Typically, these secondary storage systems are slower than the primary storage devices, but include some type of removable media (e.g., tapes, magnetic or optical disks) that may be removed and stored off-site.
In the illustrated example, the application servers 102 may be able to communicate directly with the back-up storage system 110 via, for example, an Ethernet or other communication link 112. However, such a connection may be relatively slow and may also use up resources, such as processor time or network bandwidth. Therefore, a system such as illustrated may include one or more media servers 114 that may provide a communication link 115, using for example, Fibre Channel, between the SAN 108 and the back-up storage system 110.
The media servers 114 may run software that includes a back-up/restore application that controls the transfer of data between host computers (such as user computers 104, the media servers 114, and/or the application servers 102), the primary storage devices 106 and the back-up storage system 110. Examples of back-up/restore applications are available from companies like Veritas, Legato and others. For data protection, data from the various host computers and/or the primary storage devices in a networked computing environment may be periodically backed-up onto the back-up storage system 110 using a back-up/restore application, as is known in the art.
Of course, it is to be appreciated that, as discussed above, many networked computer environments may be smaller and may include fewer components than does the exemplary networked computer environment illustrated in FIG. 1. Therefore, it is also to be appreciated that the media servers 114 may in fact be combined with the application servers 102 in a single host computer, and that the back-up/restore application may be executed on any host computer that is coupled (either directly or indirectly, such as through a network) to the back-up storage system 110.
One example of a typical back-up storage system is a tape library that includes a number of tape cartridges and at least one tape drive, and a robotic mechanism that controls loading and unloading of the cartridges into the tape drives. The back-up/restore application provides instructions to the robotic mechanism to locate a particular tape cartridge, e.g., tape number 0001, and load the tape cartridge into the tape drive so that data may be written onto the tape. The back-up/restore application also controls the format in which data is written onto the tapes. Typically, the back-up/restore application may use SCSI commands, or other standardized commands, to instruct the robotic mechanism and to control the tape drive(s) to write data onto the tapes and to recover previously written data from the tapes.
Conventional tape library back-up systems suffer from a number of problems including speed, reliability and fixed capacity. Many large companies need to back-up Terabytes of data each week. However, even expensive, high-end tapes can usually only read/write data at speeds of 30-40 Megabytes per second (MB/s), which translates to about 50 Gigabyte per hour (GB/hr). Thus, to back-up one or two Terabytes of data to a tape back-up system may take at least 10 to 20 hours of continuous data transfer time.
In addition, most tape manufacturers will not guarantee that it will be possible to store (or restore) data to/from a tape if the tape is dropped (as may happen relatively frequently in a typical tape library because either a human operator or the robotic mechanism may drop a tape during a move or load operation) or if the tape is exposed to non-ideal environmental conditions, such as extremes in temperature or moisture. Therefore, a great deal of care needs to be taken to store tapes in a controlled environment. Furthermore, the complex machinery of a tape library (including the robotic mechanism) is expensive to maintain and individual tape cartridges are relatively expensive and have limited lifespans.