The invention relates generally to the field of information storage systems and more particularly to managing backup of data in storage systems.
Storage systems have grown enormously in both size and sophistication in recent years. These storage systems can be a part of a xe2x80x9cstorage area networkxe2x80x9d (SAN), an open-standard, generally high speed, scalable network of servers and storage. An SAN advantageously provides accelerated data access, supports advanced storage management, and serves as a natural platform for clustered server applications.
SANs typically include many large disk drive units controlled by a complex, multi-tasking, disk drive controller such as the EMC Symmetrix(copyright) disk drive controller, a product of EMC Corporation, Hopkinton, Mass. A large scale disk drive system can typically receive commands, such as I/O requests, from a number of host computers and can control a number of disk drive mass storage devices, each mass storage device capable of storing in excess of ten of gigabits of data. The EMC Symmetrix(copyright) disk drive controller is a controller which allows multiple connectivity by hosts of different vendors. In such arrangements, the storage system is referred to as an xe2x80x9centerprisexe2x80x9d data storage system.
There is every reason to expect that both the sophistication and the size of the disk drive systems will continue to increase. As these systems increase in complexity, so does a user""s reliance upon the system for fast and reliable recovery and storage of data.
Efficient and effective backup for data stored on such large storage systems typically involves a tradeoff between being fast and being online. For example, speed of backup can be achieved, however, online operations are often required to be suspended. Suspension, particularly for very large storage systems, can be lengthy and expensive. Balancing the needs of the system with the requirements of proper database and critical application backup has been an ongoing battle.
The invention features a redundant communication network including a plurality of production servers and a backup server; a storage system including a plurality of production volumes for storing data and a corresponding plurality of backup volumes connected to each of the production volumes. The redundant communication network also includes a first channel and a second channel for allowing communication between the production servers and the production volume of the storage system and a backup storage unit connected to the backup server.
In a general aspect of the invention, a method of performing a backup operation on the redundant communication network includes the following steps. One of the production servers selects one of the production volumes for backup. The selected one of the production volumes is disconnected from a corresponding one of the plurality of backup volumes. The relative load on each of the first and second channels of the redundant communication network is determined. If necessary, the load on each of the first and second channels is adjusted during the backup operation. Data stored on the corresponding one of the backup volumes is transferred to the backup storage unit. When the backup operation is complete, the selected one of the production volumes is reconnected to the corresponding one of the plurality of backup volumes.
With this arrangement, data used by production servers or other host computers can be backed-up, restored to or recovered from a backup storage unit while allowing, continued parallel use of the storage system by the production servers. Thus, efficiency and productivity is increased while maintaining continuous support for the production servers on the network.
Embodiments of this aspect of the invention may include one or more of the following features. Each of the backup volumes represents an independently addressable mirror image of the corresponding ones of the production volumes. Thus, the backup volume represents a point-in-time mirror image of the active production volume that can be used to run simultaneous tasks in parallel.
Adjusting the load on each of the first and second channels is performed to maximize throughput between the production servers and storage system as well as between the backup server and storage system. Alternatively, adjusting the load on each of the first and second channels is performed such that the backup operation between the backup storage unit and the corresponding backup volume occupies the load of an entire one of the first and second channels. In essence, adjusting the load provides a balanced approach to managing flow of data between the production servers, the storage system, and the backup storage unit. Load balancing ensures that the first and second channels are used in the most efficient manner possible so that one of the channels is not overloaded, when the other is under utilized. In this way, the potential for input/output (I/O) bottlenecks is minimized.
The method can further include determining the existence of a failure on either of the first channel or second channel.
The method can further include determining whether the production server, which is selecting one of the production volumes, is performing a write operation during the backup operation. If so, the load on each of the first and second channels is readjusted.
The redundant communication network operates in a Fiber Channel protocol, thereby providing a storage backbone of relatively high bandwidth, greater connectivity, and greater distance. The first channel and the second channel may be hubs or switches. The network further includes a bridge adapter for converting data passing between the backup storage unit, such as a parallel SCSI device, and the backup server. The backup storage unit is a tape storage unit, optical storage unit, etc.