1. Technical Field
The present invention relates generally to enterprise data protection and data management.
2. Background of the Related Art
A critical information technology (IT) problem is how to cost-effectively deliver network wide data protection and rapid data recovery. In 2002, for example, companies spent an estimated $50B worldwide managing data backup/restore and an estimated $30B in system downtime costs. The “code red” virus alone cost an estimated $2.8B in downtime, data loss, and recovery. The reason for these staggering costs is simple—traditional schedule based tape and in-storage data protection and recovery approaches can no longer keep pace with rapid data growth, geographically distributed operations, and the real time requirements of 24×7×265 enterprise data centers.
Traditionally, system managers have use tape backup devices to store system data on a periodic basis. For example, the backup device may acquire a “snapshot” of the contents of an entire hard disk at a particular time and then store this for later use, e.g., reintroduction onto the disk (or onto a new disk) should the computer fail. Typically, snapshot techniques fall into one of three categories: hot, warm and cold. A hot snapshot is created while applications continue running; this creates difficulties if critical data is being modified at the time. In particular, a “hot” snapshot may have an incomplete update such that, when the data is reintroduced, it is not fully consistent. Cold snapshot requires that applications that may be modifying data be shutdown to generate a consistent image in storage. This technique causes significant application downtime. Warm snapshot can only be applied on very specific applications, such as a database that implements a standby mode where all modified data is flushed to storage; in this mode a given application may continue to receive request, but it does not modify data in the storage until it gets out of the standby mode. In most databases, this is known as “quiescent” mode and, for example, a database may enter this mode when it receives a “lock” command. While a consistent database snapshot can be generated using a warm snapshot, any data change that occurs after a snapshot (and before the next one) is at risk, resulting in data loss.
Thus, the problems with the snapshot approaches are well known and can be summarized. First, critical data can change as the snapshot is taken, which results in incomplete updates (e.g., half a transaction) being captured so that, when reintroduced, the data is not fully consistent. Second, changes in data occurring after a snapshot is taken are always at risk. Third, as storage device size grows, the burst of bandwidth required to repeatedly offload snapshots and store the complete snapshot can become impractical. Moreover, the amount of storage resources required to store the snapshots can become impractical, even in small systems without significant data changes. Most importantly, storage based snapshot does not capture fine grain application data and, therefore, it cannot recover fine grain application data objects without reintroducing (i.e. recovering) the entire backup volume to a new application computer server to extract the fine grain data object.
There have been many approaches to try to solve these deficiencies. One approach, known as block journaling, records every disk block change as it happens. Block level journaling is a storage-based approach to data protection. To perform block journaling, a secondary storage must by available. For all primary storage volumes to be protected, an initial image is first copied to the secondary storage. Once the initialization is complete, an agent must be used to capture real-time block changes. The data capturing agent records in real-time all the changed blocks. The block journal is written into the secondary storage. In particular, if a given block is modified, a block journal in the secondary storage saves both the original block and the modified block. An analogous solution is block level mirroring, which differs only in that the original block is overwritten (with the modified block) as the block journal is saved in the secondary storage. While block journaling has advantages over schedule-based tape solutions, it has other problems that make it a less than desirable approach. In particular, block-level solutions cannot perform automated recovery with application consistency and zero data loss. In addition, without manual shutdown procedures these solutions cannot protect many types of applications such as email servers and network file servers. Block level data protection also requires excessive amounts of storage and bandwidth, and it cannot be cost-effectively applied to multi-site disaster recovery. Because block-level protection has to deal with large amounts of raw blocks, its failure modes are likely to be disruptive to primary applications when applied to real-time protection.
Yet another approach is known as file journaling. File journal-based technologies are an improvement over block journaling, but they also have fundamental limitations that prevent them from moving to support rich data services or global data management and automated recovery capabilities. File journaling records only file changes, as such changes occur. It preserves all file changes in the exact order and form. While file journaling facilitates database recovery to any point in time with consistency, that reconstruction has to be done manually. This is because file journaling techniques do not capture or store application relevant events or otherwise preserve knowledge of actual transactions, file types, file relationships or other metadata. The only object that can be recovered is the file. In addition, to preserve the change order, file journaling cannot extract deltas (differences) from the application changes. Thus, although file journaling techniques have advantages over block journaling solutions, they are only file system aware only. Finally, as existing file journaling solutions are PC-based and use a standalone SQL database to store data, from an architectural standpoint they are not well-suited to scaling to support heterogeneous, enterprise-wide data management.
Thus, there remains a long-felt need to provide new and effective solutions to these and other problems in the art.