The present invention relates, in general, to the field of computer storage technology. More particularly, the present invention relates to a system, method and computer program product for saving, or backing up, data from a computer disk drive to a tape backup system as expeditiously as possible.
Modern computers perform a variety of tasks. Obviously, for example, computers perform computations. Most recently, computers have also gained popularity as communication devices, providing electronic mail (xe2x80x9ce-mailxe2x80x9d) and internet access capabilities. No less important, however, is a computer""s capability of storing and managing large amounts of data, such as on a magnetic disk, a compact disk read only memory (xe2x80x9cCDROMxe2x80x9d) or a magnetic tape.
Mass data storage is a crucial aspect of modern computer usage. For example, a bank typically stores large volumes of data, including customer records, financial market data, and internal business records, in large interconnected computer systems. Current data is generally stored in primary storage media, such as memory arrays, magnetic hard disks or optical disks, for rapid access. In many organizations, however, this data is regularly archived (or xe2x80x9cbacked-upxe2x80x9d) on archive media, such as alternate magnetic or optical disks or for larger volumes on magnetic tape, to preserve the data for future access. Preferably, the current data is copied from the primary storage media in the computer system to the archive media. The archive media is then stored in a safe location, preferably off-site, to protect the archived data from destruction. In this manner, the existence of the current data on both the primary storage media and the archive media minimizes the risk of losing the data. For example, a fire at the bank could destroy the copy of the data in the primary storage media, but the archived data copy would still be intact. The bank could then load the archived data into the computer system to recover most of the necessary data. In the recovery process, data recorded on the tape is typically read from the tape and re-recorded on a primary storage medium.
Data recorded onto a magnetic tape is typically organized into a specific tape format. Tape formats can vary according to tape types (e.g., xc2xd inch, xc2xc inch, and 8 mm magnetic tape). For example, on a xc2xd inch reel tape, data bytes are typically recorded in parallel data records onto the nine track tape. The number of bytes in a physical data record vary between one and 65,535 bytes. The available tape formats for xc2xd inch reel tapes generally include 800 BPI (Bytes Per Inch), 1,600 BPI, and 6,250 BPI. Actual storage capacity is a function of the recording format and the length of the tape reel. In contrast, on a xc2xd inch cartridge tape, data is recorded serially onto the xc2xd inch cartridge tape. The data records are recorded on cartridge tape tracks in a serpentine manner. As one track is completed, the recording drive switches to the next track and begins writing in the opposite direction, eliminating the wasted motion of rewinding. The number of bytes per data record is determined by the physical data record size specified by the recording device. Accordingly, the tape format in which data is to be recorded onto or read from the tape can affect, among other characteristics, storage capacity, transfer rate, data organization, and the mechanical movement of the tape during recording.
As the amount of data residing in the computer system increases, however, the time and computer resources required to archive the data also increase. In many circumstances, for example, back-up procedures are performed after normal work hours to minimized the impact on the performance of the computer system during the normal business day. In a typical configuration, data stored on one or more magnetic hard disks is read into a host computer system and organized (i.e., formatted) to be compatible with a particular tape data format. The host system then records the formatted data onto the magnetic tape. This continuous involvement of the host system in the back-up process consumes substantial host system computing cycles and decreases the host system""s performance in other processes. Furthermore, to process the data at rates sufficient to keep up with the streaming speed of the tape, the involvement of the host system, including communication to and from the host system, becomes a bottleneck. Consequently, need exists for a system and method to minimize the host system involvement in the tape backup and recovery processes, particularly during the transfer of the data to and from the source storage medium to the tape.
Conventional backup operations in computer systems incorporating one or more storage controllers operating under host supervision have included a backup program that performed the necessary backup operations through the storage controller. Although relatively fast, it suffered from the requirement that the storage devices to be backed up had to be removed from host access for the entire duration of the backup operation. Moreover, the process could not identify storage devices that were then in use which resulted in the saving of storage space that was not being used. The net result was an effective consumption of excess tape storage resources and ultimately slower performance.
Alternative host-based storage solutions required the central processing unit (xe2x80x9cCPUxe2x80x9d) to move large amounts of data into and out of main memory. This required the use of large amounts of CPU cycles, caused data to be moved twice over the storage system interconnects (to get the data into and out of main memory) and often could not drive the associated storage devices at their peak performance.
It would, therefore, be highly desirable to utilize the computing power in high performance storage controllers to increase the performance of disk-to-tape online backup operations.
The present invention utilizes the computing power of present day high performance storage controllers in conjunction with host computer based computer program products to increase the performance of disk-to-tape online backup operations. Through the collaborative use of a storage controller and host-based software, a high performance on-line backup solution has been provided which, in a particular implementation thereof, resulted in a four times increase in backup bandwidth over conventional host based solutions while concomitantly reducing the load on the host processor from 100% during traditional backup operations to less than 10%.
By splitting the backup process into a host-based component and a storage controller component, a number of traditionally encountered problems have been ameliorated. First, the system and method of the present invention allows the host component to interact with other processes in the host environment to prevent deadlocks and data access conflicts. Secondly, it allows a user to monitor the progress of the backup operation and stop it if necessary. Thirdly, it allows for the identification of data that might require some other sort of operation performed on it. By essentially freeing up some computing power from the host to address these issues, the task of the storage controller can be optimized to move data to tape as quickly and efficaciously as possible.
The system and method of the present invention provides a host-based computer program implemented functionality that enables part of the backup operation to be performed by the host itself while concurrently utilizing specially implemented storage controller based functions to perform the backup operation. This is effectuated by providing the storage controller with a command that allows for the transfer of a contiguous group of disk drive blocks to a tape drive. Reading groups of contiguous blocks is the optimal way to read data from a disk, and by sending the storage controller groups of contiguous blocks, the task of the storage controller is kept simple so that it can be optimized and is easy to implement. The system and method of the present invention may be utilized in conjunction with a storage controller that controls both the disk drive and tape drive as well as a tape controller configured to read data from the disk drive over a network or storage interconnect.
In operation, the process begins with the host software setting up the destination media, e.g. a tape on a tape drive. The host software then allocates the drive and initializes the media. At this point, the host-based process initiates a scan of the filesystem for files the user has specified for backup, which, in many cases may be all of the files. The process then locks the files specified and creates a bitmap of xe2x80x9cin usexe2x80x9d blocks. In the case of a full disk backup operation, the operating system may allow the entire disk to be locked in a single command and the filesystem may maintain a xe2x80x9cusedxe2x80x9d (or, alternatively, a xe2x80x9cfreexe2x80x9d) block bitmap in this regard. The bit map is then scanned for groups of contiguous blocks with the possibility that small xe2x80x9cholesxe2x80x9d (or gaps) in the contiguous blocks (e.g. on the order of 5-10 or more unused blocks) can be effectively ignored to create larger groups.
The large groups of contiguous blocks are then packaged as a single command and set to the storage controller which then copies them from the disk drive to the tape media. The remainder of the disk blocks are read into the host computer memory, assembled into tape records and written to tape. At anytime during the transfer operation, the process can be stopped by the host process. If any errors or unusual cases are encountered, the host process may handle the condition (e.g. a media change on the tape drive) and continue the backup process. Alternatively other errors or unusual conditions may cause the controller to stop processing and return progress and status information to the host process. At the completion of the data transfer, the locks and other resources are released and the tape data set is closed.
The system and method of the present invention effectively and efficiently utilizes the capabilities of two different processes to perform high speed on-line backup operations. The host process solves the xe2x80x9con-linexe2x80x9d related problems of deadlock and data access conflicts and when the host process locks the files, it can control overall access to the data. When another process on the host system requires access to the data that is then being saved, it can either wait on the lock or the backup host process can release the lock and remove the data from the xe2x80x9cin usexe2x80x9d block bitmap. The locking function also prevents the backup process from accessing data that is currently in use.
On the other hand, the controller process performs data movement in a faster and more efficient manner than would be possible through a host-based process which would require that the data be first moved into main memory and then back out to the tape drive for backup. Such double use of the storage interconnects would naturally slow any backup operation. In comparison, the storage controller process allows for the movement of data directly between the disk drive and the tape device. Because the controller is provided groups of contiguous blocks of data, the disk drives are accessed in disk block order which is the fastest way to read the data from the disk. The tape drive can then be written in a continuous stream of data thereby also allowing the tape drive to operate at peak xe2x80x9cstreamingxe2x80x9d speed.
In essence, the system and method of the present invention differs from conventional controller-based backup operation by allowing storage to remain accessible. This is effectuated by using a host-based process to control data access conflicts with other host processes. Still further, the system and method of the present invention can utilize the filesystem knowledge of the host to prevent the controller from backing up unused disk space while still using the controller hardware to perform tasks which it can accomplish much more expeditiously and efficiently than the host.
Particularly disclosed herein is a system, method and computer program product for hardware assisted backup for a computer mass storage subsystem wherein files to be backed up, or otherwise saved, from a disk to a tape media are written to the tape in logical block number (xe2x80x9cLBNxe2x80x9d) order regardless of the file""s on-disk layout. If it is available, the disk file structure information may also be written to the beginning of the tape to allow subsequent file-level restore operations to be performed. If the file structure information is not available in a concise form, the disk blocks containing the file structure are marked in a used block bit map and also written to the tape medium. File-level restore operations are advantageously able to understand the disk structure so that the appropriate blocks can be read in to build the file system table.
The used-block bit mask may be modified to exclude files that are xe2x80x9copen for writexe2x80x9d, marked as xe2x80x9cno backupxe2x80x9d or not part of the selected file save operation. In operation, the blocks are written to tape using Tape Copy Data (xe2x80x9cTCDxe2x80x9d) commands as disclosed and claimed in the aforementioned patent application assigned to Digital Equipment Corporation. Blocks that were selected, but excluded as xe2x80x9copen for writexe2x80x9d may then be written to the tape utilizing more conventional methodologies. Where possible, the entire disk volume is locked during the backup operation. However, the system and method of the present invention also allows for the locking of individual files if all of the files are not being saved. In those instances where the size of any group of contiguous data blocks is too small to be efficaciously backed up, unmarked blocks may also be included in the transfer to tape to speed operation and the host computer input/output (xe2x80x9cI/Oxe2x80x9d) can be utilized to write data to the tape if such is faster than creating a TCD command.