The present invention relates generally to computer data storage backup, and more particularly, to a data storage system that performs a backup of data from primary storage to tape in response to a backup command.
Due to advances in computer technology, there has been an ever-increasing need for data storage in data processing networks. In a typical data processing network, there has been an increase in the number of volumes of data storage and an increase in the number of hosts needing access to the volumes. This has been especially true for networks of workstations. Not only have a greater number of workstations been added to the typical network, but also the increase in data processing capabilities of a typical workstation has required more data storage per workstation for enhanced graphics and video applications.
Fortunately for computer users, the cost of data storage has continued to decrease at a rate approximating the increase in need for storage. For example, economical and reliable data storage in a data network can be provided by a storage subsystem including a Redundant Array of Independent Disks (RAID). Presently it is practical to provide a single data storage subsystem with up to 20 terabytes (TB) of storage, or approximately 4000 logical volumes, using magnetic disk drives each having a storage capacity of 46 gigabytes.
Unfortunately for network administrators, the development of services for storage management has lagged behind the increase in storage to be managed. Consequently, the cost of storage management has become relatively more significant. More troubling is the difficulty of maintaining the same level of management service as the amount of storage increases. For example, users are accustomed to being provided with backup and restore services for their data that is stored on the network. Users are encouraged to store their data on the network so that it can be shared by other authorized users and maintained in a cost-effective manner in accordance with corporate document retention policies. However, data stored on the network is always subject to some chance of loss due to a severe failure of the data storage system. Backup and restore services are a conventional way of reducing the impact of data loss from the network storage. To be effective, however, the data should be backed up frequently, and the data should be restored rapidly from backup after the storage system failure. As the amount of storage on the network increases, it is more difficult to maintain the frequency of the data backups, and to restore the data rapidly after a storage system failure.
In the data storage industry, an open standard network backup protocol has been defined to provide centrally managed, enterprise-wide data protection for the user in a heterogeneous environment. The standard is called the Network Data Management Protocol (NDMP). NDMP facilitates the partitioning of the backup problem between backup software vendors, server vendors, and network-attached storage vendors in such a way as to minimize the amount of host software for backup.
The current state of development of NDMP can be found at www.ndmp.org/info. The NDMP server must implement a number of interfaces, including a CONNECT interface, a CONFIG interface, an SCSI interface, a TAPE interface, and a DATA interface. The CONNECT interface is used when a client opens the communication to a NDMP server. This interface allows the NDMP server to authenticate the client and negotiate the version of protocol used.
The CONFIG interface allows backup software to discover the configuration of the NDMP server. It can be used to discover tape drives and jukeboxes as well as file systems and databases. Backup software will use this interface to build request and.media server databases automatically.
The SCSI interface simply passes SCSI commands (known as CDBs) through to the SCSI device and returns the SCSI status. The backup software will use this interface to control a locally attached jukebox. Software on the backup software host will construct SCSI CDBs and will interpret the returned status and data. This interface can also be used to exploit special features of SCSI tape drives.
The TAPE interface will support both tape positioning and tape read/write operations. The backup software will use this interface to control the labeling and format of the tape. The backup software will also use this interface for positioning of the tape during backups and restores.
The DATA interface actually deals with the format of the backup data. The backup software will initiate backups and restores using this interface. The backup software provides all of the parameters that may affect the backup or restore using this interface. The backup software does not place any constraints on the format of the backup data other than it must be a stream of data that can be written to the tape device.
The NDMP server may send a number of messages to the backup software host. All of the messages that the backup software host accepts are asynchronous. None of these messages will generate a reply message. These messages include a NOTIFY message, a FILE HISTORY message, and a LOGGING message. The NDMP uses the NOTIFY message to notify the backup software that the NDMP server requires attention.
The FILE HISTORY message allows the NDMP server to make entries in the file history for the current backup. The backup software uses this message to select files for retrieval.
The LOGGING message allows the NDMP server to make entries in the backup log. The operator uses this message to monitor the progress and successful completion of the backup. It is also used to diagnose problems.
It will be assumed that the reader is familiar with the details of NDMP, for example, as set out in the Internet Draft Document by R. Stager and D. Hitz entitled xe2x80x9cNetwork Data Management Protocolxe2x80x9d document version 2.1.7 (last update Oct. 12, 1999), incorporated herein by reference. Therefore, the following disclosure will deal primarily with certain backup and restore operations in a data storage system, with the understanding that such a data storage system may provide other functions as known to a person of ordinary skill in the art familiar with the details of NDMP.
Since backup software need not place any constraints on the format of backup data other than it must be a stream of data that can be written to the tape device, the inventor has discovered that the performance of the storage system can be improved by the addition of certain facilities which may cause the tracks of a storage volume to become non-sequential as they are written to the tape device. In particular, it is desirable to continue host read-write access to a storage volume that is being backed up. If a host has read-write access to a storage volume, then the storage volume will be referred to as a xe2x80x9cproduction volume.xe2x80x9d When a production volume in a primary storage subsystem is being backed up, it is desirable to receive the backup data from the primary storage subsystem as quickly as the primary storage subsystem delivers the backup data. Otherwise, when the host writes to the production volume, the maintenance of a snapshot copy of the backup data in the primary storage subsystem will increase the storage load and may also increase the processing load on the primary storage subsystem. Often, however, the tape storage device cannot write the backup data to tape as fast as the storage subsystem delivers the backup data. One solution to this problem is for the primary storage subsystem to write the backup data to intermediate disk storage in a secondary data storage subsystem and then write the backup data from the intermediate disk storage to tape storage. However, if the tracks of backup data from the snapshot copy of the production volume can be written to tape in a non-sequential fashion, then the required storage and data processing resources for the intermediate disk storage can be minimized by selectively bypassing-the intermediate disk storage whenever possible.
Accordingly, in accordance with a first aspect, the invention provides a method of selective buffering of the backup data from primary data storage before the backup data is written to the tape. The method includes transferring a portion of the backup data to be written on the tape from the primary data storage to intermediate disk storage when the tape storage device is not ready to receive the backup data to be written on the tape from the primary data storage, and later transferring the portion of the backup data to be written on the tape from the intermediate disk storage to the tape storage device. In addition, when the tape storage device is ready to receive the backup data to be written on the tape and the backup data is being transmitted from the primary data storage device and the portion of the backup data is contained in the intermediate disk storage and has not yet been written to-the tape storage device, the intermediate disk storage is bypassed to transfer to the tape storage device the backup data being transmitted from the primary data storage device.
In accordance with another aspect, the invention provides a method of transferring a first portion of the backup data from the primary data storage to a memory buffer and from the memory buffer to the tape. When the backup data is delivered from the primary data storage to the memory buffer at a faster rate than the backup data is written from the memory buffer to the tape, overflow of the memory buffer is prevented by transferring a second portion of the backup data from the primary data storage to intermediate disk storage, and at a later time transferring the second portion of the backup data from the intermediate disk storage to the tape.
In accordance with another aspect, the invention provides a method of selective buffering of backup data in a data storage system including at least one data mover computer receiving a stream of the backup data from a data storage device, a cached disk storage subsystem coupled to the data mover computer for intermediate data storage, and a tape library unit coupled to the data mover computer for storage of the backup data onto a tape. The backup data from the data storage device is selectively buffered before the backup data is written to the tape. The method includes transferring a first portion of the backup data from the data storage device to a random access memory buffer in the data mover computer and from the random access memory buffer to the tape. When the backup data is delivered from the data storage device to the random access memory buffer at a faster rate than the backup data is written from the random access memory buffer to the tape, overflow of the random access memory buffer is prevented by transferring a second portion of the backup data from the data mover computer to the cached disk storage subsystem, and at a later time transferring the second portion of the backup data from the cached disk storage subsystem to the tape.
In accordance with yet another aspect, the invention provides a data processing system including primary data storage adapted for coupling to a host processor for read/write access, a tape storage device for providing backup storage on a tape and coupled to the primary data storage for transfer of backup data from the primary data storage to the tape in response to a backup request. The data processing system further includes a memory buffer coupled to the primary data storage and the tape storage device for buffering of the backup data from the primary data storage before the backup data is written to the tape, and intermediate disk storage coupled to the primary data storage and the tape storage device for buffering of the backup data from the primary data storage before the backup data is written to the tape. Moreover, the data processing system includes flow control logic coupled to the memory buffer and to the intermediate disk storage for controlling a flow of the backup data to the memory buffer and to the intermediate disk storage so that when the backup data is delivered from the primary data storage to the memory buffer at a faster rate than the backup data is written from the memory buffer to the tape, overflow of the memory buffer is prevented by buffering a portion of the backup data in the intermediate disk storage for writing to the tape at a later time.
In accordance with still another aspect, the invention provides a data storage system including a tape library unit including at least one tape, at least one data mover computer adapted for receiving a stream of backup data from a data storage device and coupled to the tape library unit for transferring the backup data to the tape library unit for writing the backup data onto the tape; and a cached disk storage subsystem coupled to the data mover computer for receiving data from the data mover computer, and coupled to the tape library unit for writing data from the cached disk storage subsystem to the tape library unit. The data mover computer includes a random access buffer memory, and the data mover computer is programmed to control a flow of the backup data to the random access memory buffer and to the cached disk storage subsystem so that when the backup data is delivered to the random access memory buffer at a faster rate than the backup data is transferred from the random access memory buffer to the tape library unit, overflow of the random access memory buffer is prevented by buffering a portion of the backup data in the cached disk storage subsystem for transfer to the tape library unit at a later time.
In accordance with a final aspect, the invention provides a program storage device containing a program executable by a data processor for selectively buffering a stream of data from a data source to a data sink by buffering the data in a buffer memory or in disk storage. The program is executable by the data processor for buffering the data in the memory buffer unless the memory buffer becomes substantially full, and when the memory buffer becomes substantially full, buffering a portion of the data from the data source in the disk storage. The program is also executable by the data processor for supplying data to the data sink from the buffer memory unless the memory buffer becomes substantially empty, and when the memory buffer becomes substantially empty, supplying the portion of the data from the disk storage to the data sink.