Big data analytics is a process for examining large data sets to determine patterns, correlations, trends, and other information. An emerging trend in big data analytic applications is to use a secondary database system to offload large analytic queries from a primary database system, in order to boost query processing performance. The secondary database system may act as a query processing system that does not support all features of a standard database system. For example, the secondary database system may not be able to directly receive database statements or data changes from a client application. Rather, the secondary database system may rely on the primary database system to be ACID (Atomicity, Consistency, Isolation, Durability) compliant, as expected in standard database systems.
The secondary database system may store a copy of data on which queries received at the primary database system execute. Data changes are received and executed on the primary database system. New or updated data must be added to the database of the secondary database system, and deleted data must be removed.
A possible approach for propagating changes to the secondary database is propagating changes on an as-needed, on-demand basis, when data is targeted by a query. Although on-demand propagation may spread out the computing cost of propagating changes, it increases the response time required when executing a query.
Another possible approach is to propagate changes as soon as they are received or committed in the primary database system. This results in a faster query response time compared to on-demand propagation, but results in high overhead for the database system when large amounts of changes are received within a short amount of time by the primary database system.
A third possible approach is to schedule change propagation at specific times or at regular intervals. However, this method does not guarantee that secondary database system will have up-to-date data prior to executing a query in the secondary database system. Thus, this method does not guarantee that a query will produce accurate results.
However, for data analytics queries, such as those performed for big data analytics, data consistency is required in order for a query to produce accurate results. Furthermore, since the goal of using of a secondary system is to increase query execution efficiency, query performance cannot be significantly affected by change propagation. Thus, a method for change propagation that maintains both data consistency and query execution efficiency is required.
Additionally, most systems that use a particular propagation method require users to configure the primary and secondary database system and select a particular change propagation method based on what the user expects the database systems' workload and data change pattern to be. Once the secondary database system is configured, it may be difficult or impossible to switch between different propagation methods if the workload and/or data change pattern is not as expected.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.