A cache is a collection of data that is a duplication of original values stored elsewhere or computed earlier, when the original data is expensive to fetch or to compute relative to retrieval from the cache. For example, a server-side query cache for a database may store the results for a Structured Query Language (SQL) query received by the server in a cache in server memory. Storage of the query results in the server-side query cache enables the retrieval of the results for a query upon receipt of an identical query without requiring the server to re-execute the query against the database. In order to receive accurate results from the query cache, there must not have been a change in the data that was relied upon in the formation of the initial result for the query. A server side query cache may improve the retrieval results of the data, primarily with read only or read mostly data.
In one embodiment, the cache is implemented as a daemon process, a process running in the background, running on the client that the client interacts with for caching. In another embodiment, a caching daemon process may run on a middle tier and be shared between multiple clients. The cache can be located per client session, per client process shared by different sessions in same process, in shared memory/daemon-process on client shared by different client processes on the same machine, or in a daemon on a different machine shared by different client machines. A client-side query cache can either be in memory and/or on physical storage accessible by client processes.
Client-side query caches, a query cache in client memory, provide an additional benefits over a server-side cache. First, caching on the client eliminates the necessity to perform the request to the server and receive the response from the server in order to retrieve the query results thereby improving response time. Client machines can keep being added horizontally to provide the caching capabilities in client memory and reducing the expense of setting up additional servers to support caching query result. Further, storage on the client side offers the benefit of not only having the queries closer to the client but also ensures that the most relevant queries to the client are stored at the client.
However, storage of the query results in a client-side query cache may introduce data consistency problems that are not present with the use of a server-side query cache, and the data correctness problems, if left unresolved, produce unexpected results for the user querying the database. In the database, a snapshot, a record of the state of the database, is created when a transaction is executed that changes the state of the database. The snapshot is monotonically increasing and there is never a regression back to an earlier snapshot, which means that succeeding queries in time see more recent snapshots and never earlier snapshots. Any statement executed on the database is guaranteed to run against such a consistent snapshot, also known as the execution snapshot, that guarantees to include all changes to the state of the database done by all transactions leading up to the snapshot, and no changes to the database after the snapshot will effect the results of the query run against that snapshot. The database guarantees the results of the query are generated against the snapshot of the database at the time of receipt of a query (also known as the query execution snapshot), and the user expects query results from a cache to maintain this level of transactional consistency. At the server, the server-side query cache can simultaneously invalidate query results in the cache upon receipt of a transaction that necessitates invalidating the corresponding query results stored in the cache. The client-side query cache residing on the client is not able to simultaneously invalidate the cache with changes that occur in the database, hence the challenge lies in the ability to produce consistent query results with the use of a client-side cache.
FIGS. 1A-B are block diagrams that illustrate the consistency problems encountered in the described approach with client-side query caches. In FIG. 1A, SQL queries have been previously requested by the Client 100 with the Database Application Programming Interface (API) 102 and the Client-side cache 104 has stored query results, as depicted with Query Results for Table A 106 and Query Results for Table B 108. The Query Results for Table A 106 and Query Results for Table B 108 in the Client-side Cache 104 reflect the contents of the tables, Table A 110 and Table B 112 respectively, currently in the Database 114 on the Database Server 116. There is a relationship between Table A 110 and Table B 112 (e.g. trigger) that requires that a portion of the data in Table A 110 be placed in Table B 112. With the Database API 102, in the same transaction that modifies Table A 110, Client 100 makes a Request to Insert Mehul in to Table A 118.
FIG. 1B shows the contents of the Client-side Cache 104 of the Client 100 and the Database 114 after the Database Server 116 responded to the Request to Insert Mehul To Table A 118. The Database Server 116 has responded to the Request to Insert Mehul 118 to Table A 118 by inserting a second row to Table A 110 (e.g. “2 Mehul mehulB”) and the insertion of the row to Table A 110 has triggered the addition of a second row to Table B 112 (e.g. “2 Mehul”). The Client-side Cache 104 is aware of the request made by the Client 100 as reflected by the contents of the Query Results for Table A 106 and unaware of the addition to Table B 112 as reflected by the Query Results for Table B 108. Thus, in Figure B, if the Client 100 in FIG. 1B requests the contents of Table B 112, then the Client 100 will lookup the results in the Client-Side Cache 104 and retrieve the Query Results for Table B 108 without the newly added row. The production of query results for Table B with the client-side cache without the newly added row cannot be properly handled by the application relying on the data.
Thus, there is a need for a solution to ensure the same level of consistency with client-side cache as the user expects with the database or the use of a server-side cache. The solution should be both a transparent solution and guarantee transactional correctness similar to that provided by the database with the use of a snapshot. As another example, query results could be a join of multiple tables and there is a need to refresh cached result(s) with database changes that affect any of the tables in the query. Additionally, there could be different clients or software running on the server that concurrently make database changes that affect the cached result set and there is a need to identify all database changes that affect cached result sets on the client. Beyond database changes, user environment settings (e.g. changing the language from French to German) may affect the result set and there is a need to detect such non-database changes to refrain from returning incorrect results to the application. A change in session or environment settings may indicate a need to invalidate cached result sets or create new cached result sets.
Although embodiments are described in reference to a client-side query cache, it should be noted that the consistent caching implementation can also be used with caches that support other content. For example, the consistency of the client-side cache can be used to ensure consistent caching of any other type of cached content that may be derived from the result from of a database operation.