1. Technical Field
The present invention relates generally to enterprise data protection and data management.
2. Background of the Related Art
Techniques for managing data history in distributed computing systems are known in the art. In particular, traditional content management systems typically manage file history by using either “forward delta” management, “reverse delta” management, or a combination of both techniques. A forward delta management system maintains an initial baseline of the file as well as a list of deltas (changes to the file) that occur after the baseline is created. In a forward delta management system, deltas are appended to a delta document sequentially. An advantage of such a system is that, as deltas arrive, the system only needs to append them to an end of a delta document. However, when a user tries to access a file (or when a host needs to recover its lost data to a specific point-in-time, version, or the most current point-in-time), the forward delta management system must (at runtime) take the baseline and apply the necessary delta strings “on the fly” to generate the requested point-in-time data. If there is a long list of delta strings, the read latency of such an operation may be very long; in addition, the cache required to process the delta strings during the read operation may be unacceptably high.
A reverse delta management system maintains the most current point-in-time data and a list of reverse deltas (an “undo” list) in a delta management file. A reverse delta management system first takes a given forward delta and applies the delta to last point-in-time data to generate the most current point-in-time data; it then uses the most current point-in-time data to compare with the last point-in-time data to generate an undo (reverse) delta. This type of system only keeps the most current data file and a list of undo deltas. If the most current data is requested, the data can be retrieved instantly. If, however, data from a previous point-in-time is requested, this system must take the most current data file and apply the necessary undo delta(s) to generate the requested point-in-time data. The baseline copy in this system is the most current point-in-time copy. In many cases, there may be a significant read latency for previous data. In addition, the computing power needed for ongoing data updates in such a data management system is very significant. This technique also does not support data replication over an unreliable network, as the baseline copy of the data is constantly changing.
When performing incremental data protection, traditional data management systems copy the entire contents of a changed file into a protection repository, where the file history is saved. These systems, however, do not apply any delta management techniques, such as those described above, to manage the file history. Moreover, because these systems are not storage and bandwidth efficient, they are not suitable for performing real-time data services.
The traditional content management systems can manage file history, but they are not capable of managing unstructured and dynamic data. Further, a traditional system of this type requires that its data source be well-structured, i.e., having directories that are created and configured in advance. In most cases, a given content management system is designed to manage a specific content type as opposed to dynamic data. Thus, for example, a given source control system may be designed to manage design documents or source code, but that same system cannot manage data that changes constantly. These systems also are not capable of protecting changing data in real-time. To the extent they include delta management schemes, such schemes do not enable efficient any-point-in-time data recovery.
There remains a need in the art to provide distributed data management systems that can efficiently manage real-time history of a large amount of unstructured and dynamic data with minimal storage and bandwidth usage.
There also remains a need in the art to provide such a distributed data management system that can perform virtual-on-demand recovery of consistent data at any point-in-time in the past.
The present invention addresses these deficiencies in the art.