Data backup is an essential element of the data protection process in every organization. Historically it has involved sending a backup copy of the data to a tape storage device. Exponential data growth, a shrinking backup window, heterogeneous platforms and applications (an open systems environment), and rising downtime costs are some of the data storage challenges facing IT administrators today. As a result, data backup is now typically the number one storage problem for IT administrators.
A traditional backup system architecture 10, shown in FIG. 1, has a backup application 12 residing on a backup server 14 and acting as the point of management and control for both the backup process and associated tape hardware. Backup server 14 is typically disposed on a local area network (LAN) 16, where it is connected to a plurality of local hosts (e.g., PC's and other servers (not shown) requiring data backup) and to a tape library 18. However, a variety of different backup applications are now available from various vendors, each compatible with different operating systems, storage systems and applications. Integrating these various backup applications into an open systems environment, with heterogeneous hosts and heterogeneous tape storage systems, is a significant challenge.
Apart from the difficulties of integrating the different systems, backup and recovery from tape is itself an inherently labor-intensive, complex and error prone process. The success rate for tape backup varies between 95 and 99%; for tape recovery, a less frequent but very critical operation, it is even lower. The operational costs related to tape backup and recovery management keep rising as the complexity of the system and the amount of data increase.
As a result of these problems, new data protection schemes have been proposed. One approach is to integrate disk-based cache (an expensive form of temporary storage typically used for application data) to improve backup performance and reduce recovery time. Another approach is to utilize disk-based library storage for data backup, this too being a more expensive alternative than tape storage. Some systems emulate a tape storage device with a disk storage device. In one such emulation system, commonly used in a mainframe (dedicated host and storage device) environment, tape requests are intercepted in the host server and converted to disk requests so that an unmodified magnetic disk storage device can emulate (act as a virtual) magnetic tape storage device.
While solving some of the problems of traditional tape-based backup and recovery methods, these new approaches have generated problems of their own. Many of these new approaches do not integrate seamlessly into the variety of existing backup applications and procedures of open systems environments. Some approaches require new systems hardware, as well as software. Others are too expensive, requiring additional disk space in primary (expensive, high performance) storage disk arrays. Furthermore, many of these approaches do not consolidate the backup data procedures, but rather are niche solutions suited to only a portion of the data handled by a data center.
Whereas tape storage has been central to data backup, disk storage has been central to applications storage (i.e., primary storage), which requires more immediate access to data. Thus, traditional disk arrays have been optimized for application storage performance. These storage arrays include RAID architectures for data availability, redundant support systems for reliability of the full data array, wide band channels to support high throughput, and caching to reduce input/output (I/O) latency. Because of their criticality to systems operation, applications storage arrays are also designed with redundant components (including the disks themselves) that can be removed and replaced without interrupting systems operation (referred to as “hot swap” capability). As a result of their increased complexity, application storage arrays typically cost at least ten times the amount of raw disk space.
For most data protection applications, and specifically for backup, many of these design complexities are not required. Additionally, while application storage systems must be designed so that the full data array is available at all times, most data protection applications require only a small fraction (e.g., ten percent or less) of the data to be active at any time.
FIG. 2 illustrates an enhanced backup architecture 20 which includes both disk and tape storage. In this schematic drawing, a plurality of hosts 21 (e.g., computers) are connected by a LAN 22. A plurality of servers 24, e.g., application server 25, e-mail server 26, web server 27, and backup server 28 on which backup application 29 resides, are connected by a Storage Area Network (SAN) 30, and to LAN 22. Data paths 32, 34, 35 exist between backup server 28 and each of disk library 38, which serves as a target for backup data, and tape library 36, which serves as a target for archive data. Systems of this type have been implemented ad hoc to reduce backup times and/or to increase the confidence and completion (success rate) of backup within a given backup window. However, the ability to scale such an architecture is limited, particularly in open systems environments which include a variety of different vendors' equipment.
Thus, there is a need to provide a backup data protection system having a more cost-effective combination of some (and preferably all) of the following characteristics: capacity; performance; availability; cost; compatibility; simplicity; and scalability.