Information stored in a data system may be shared with other data systems. To share information, a data mining process may be deployed at a source entity and, correspondingly, an applying process that consumes mined data captured by the data mining process may be deployed at a sink entity. The mined data from the source entity may be propagated (or streamed) to the sink entity by a propagator process. As the processes involved are deployed in different machines and using various inter-processor and/or inter-process communication mechanisms, typically, such information sharing can only be supported by like, if not identical, systems that are developed by the same data system provider. As a result, it is difficult to use such techniques to enable sharing information in a general way between heterogeneous systems. For example, different (or heterogeneous) data systems, sourced from different providers, may use divergent hardware and dissimilar operating systems. Such data systems may use incompatible schemes, system designs, internals, or interfaces that can at best only inter-operate with each other in a very minimal, rudimentary way, thereby precluding an efficient and robust integration for the purpose of sharing information with each other.
While a transparent gateway may be used between heterogeneous data systems to pass standard-based SQL statements instead of directly streaming related data changes, such SQL statements may have to be reconstructed from logs at a source entity and applied at a sink entity in a rather inefficient manner as compared with direct streaming that is available between the like systems. At least one roundtrip is required for each transaction transferred between the entities. In addition, use of the transparent gateway is also limited to data systems that can re-construct and understand SQL statements, and therefore is not applicable to non-database systems (e.g., file systems, application server) that do not use or understand SQL statements.
Another approach for sharing data in heterogeneous data systems is to use a general purpose buffered queue to stage data in a heterogeneous data system, as described in U.S. patent application Ser. No. 11/496,949 (“REPLICATING DATA BETWEEN HETEROGENEOUS DATA SYSTEMS”) filed on Jul. 31, 2006, the entire content of which is hereby incorporated by reference for all purposes as if fully set forth herein. However, this approach may suffer from latch contentions among enqueuers and dequeuers, complicated shared memory management, complicated recovery protocol, lack of eager apply (applying transactions before seeing the commit).
Therefore, a better mechanism, which would better support information sharing among heterogeneous data systems, is needed.