Many application systems today make use of various forms of asynchronously updated replicas, which are saved data derived from some underlying source tables, to improve scalability, availability and performance. The term replica can mean traditional replicated data and data cached by various caching mechanisms. “Asynchronously updated” means that the replica is not updated as part of a database transaction modifying its source tables; so that the state of the replica does not necessarily reflect the current state of the database. While the use of replicated data greatly enhances performance in many cases queries are still unnecessarily routed to a backend server due to uncertainty about the currency and consistency requirements of the query as well as the currency and consistency of the replicated data.
If an application uses replicas that are not in sync with the source data, it is clearly willing to accept results that are not completely current but, most likely, the application has some limits on the acceptable “age” of the data. Today, these currency requirements are not explicitly declared anywhere but can only be indirectly inferred from the properties of the replica used. The following example illustrates this approach.
An application that queries a replicated table that propagates updates every 30 seconds is implicitly saying that it is willing to accept data that is up to 30 seconds old. If the replication is later reconfigured to propagate updates every 5 minutes, it is necessary to determine if 5 minute old data still meets the currency requirements of the query. Today's systems cannot provide any assistance in making this determination because they have no way of knowing what a given query's currency requirements are.
Data currency requirements are currently expressed implicitly through the choice of data sources for queries. For example, if a query Q1 does not require completely up-to-data data, the application may be designed to submit the query to a database server C that stores replicated data instead of submitting it to database server B that maintains the up-to-date state. Another query Q2 access the same tables but requires up-to-date data so the application submits it to database server B. The routing decisions are hardwired into the application and cannot be changed without changing the application.
Recent work has addressed peripherally some of the issues surrounding currency and consistency requirements in database query processing. For example, Epsilon-Serializability allows queries to specify inconsistency bounds but the purpose is entirely different. The objective is to achieve higher degree of concurrency by allowing queries to see database states with bounded database divergence introduced by concurrent update transactions.
WebViews suggests algorithms for the online view selection problem considering a new constraint—the required average freshness of the cached query results. The model of freshness is relatively coarse and the use is purely heuristic, providing no guarantee on currency and consistency of the result of an individual query.
Recent work with obsolescent materialized views deal with determining whether to use local or remote data by integrating the divergence of local data into the cost model for the database system's query optimizer. The usage of currency information is heuristic, the staler the data is, the higher the cost added to the plan. A related approach formulates a heuristic function that is simply the comparison of a user specified threshold and a weighted score calculated from a single view. No guarantees on the currency or consistency of the result of an individual query are provided.