1. Field
Embodiments of the invention relate to automatically transitioning between historical and real time data streams in the processing of data change messages.
2. Description of the Related Art
In many cases, two data streams are available for use when capturing data changes from a database management system, file management system, or other data source (“data source”): a real time data stream and a historical data stream (e.g., a stream or collection of log records). The real time data stream contains information about recently performed data changes. The historical data stream contains information about data changes that have occurred over a longer period of time. Often, the real time data stream has attributes that favor its use over the historical data stream. For example, the real time data stream may be available via direct memory reference, while the historical data stream may require access to mass storage devices (e.g., disk or tape). Additionally, the real time data stream usually exhibits minimal delay in making the data available to the data capture process (e.g., a data capture program), while the historical data stream may exhibit delays anywhere from seconds to hours due to delays in writing changes to logging media and media sharing characteristics. A data capture program may be described as a program that captures changes which occur against a data source. The captured changes may be replicated (e.g., duplicated to another database) or published as an audit trail.
In an ideal world, the data capture program would process only the real time data stream; however, there are situations in which the data capture program prefers to or has to process the historical data stream in order to capture all of the data changes that have occurred against the data source. An example of such a situation is one in which the data capture program is terminated while a data source program continues to execute. In this situation, it is desirable to process the historical data stream only up to the time that it is possible to switch back to the real time data stream. Unfortunately, this switch from using the historical data stream to using the real time data stream is not necessarily a simple process. Instead, the process of switching is complicated by the fact that the two data streams may not be in exactly the same order and may not contain identical data. In addition, switching from historical to real time data streams while data changes are being written creates a race condition that must be reconciled to ensure that data loss or duplication of the published data does not occur.
At present, the responsibility to perform the switch between using the historical data stream and using the real time data stream falls to an administrator of the data capture program and usually requires that the data source program be quiesced for some period of time to insure that the historical data stream is completely consumed before data changes begin flowing from the real time data stream.
Thus, transition from the processing of the historical data stream to the real time data stream is currently a manual process that requires a period of data source quiescence. There is a need in the art for improved switching between the historical and real time data streams.