In a basic database client/server arrangement, one or more database clients can make queries via a network to a back-end database server. The database server executes the queries and returns results to the client that requested them. The clients might be, for example, web servers or application servers or a mix of both. Typically all the clients transmit their queries to a common IP address and port number combination on the database server, and the database server transmits its results back to the IP address that originated each query. Usually an authentication process precedes the queries, in which the client provides the database server with authentication credentials (such as a username and password) and the database server establishes a “connection”, identified by a connection ID, according to which the client can make its queries. A database server in this basic arrangement is often a single computer system, though in some implementations it might actually be a cluster of computer systems. Even such a cluster, however, is arranged so as to appear to the client as a single server reached via a single IP address and port number combination.
Many more sophisticated database client/server arrangements support the concept of master-slave replication to distribute or spread the load of read/write queries among a plurality of servers. Database servers configured for master-slave replication are said to form a cluster, and operate in sync with each other ensuring that the data between all servers in a cluster is current. One of the advantages that master-slave replication offers is that of allowing the application administrators to scale out read performance, without spending excessively on hardware by having to buy a single, more powerful server. The master database server handles all queries that write/manipulate data (insert/update) and the remaining read queries (select) are distributed among the master and the slaves.
Any query or command which does not alter the database in any way, and merely retrieves information stored in the database, is considered a read query, and is typically capable of being processed by both the slave and the master servers, examples of such queries are basic “select” queries, which just retrieve a particular value from a table in the database. Any query or command which may alter or change or delete or add data in the database, or other related metadata like usernames, passwords, configuration etc. is considered a write query, and can only be handled by master servers, examples of such queries are basic “insert”, or “alter” queries which add or modify the data or structure of a table in a database. If a write query were to be sent to a slave server, it will either be ignored, or cause the slave database to have a different state of data than the master, and hence fall out of sync, compromising the data integrity of the cluster.
In a typical Master-Slave database architecture, there are designated master and slave database servers. The master servers are capable of handling both read and write queries, while the slave servers handle the read queries. The most commonly used replication method used in a typical master-slave cluster involves the slave database servers reading the logs of transactions on the master servers that store all SQL statements that manipulated the database and execute them locally to ensure that the master and the slave database servers are in sync. Some replication architectures may also utilize a separate replication manager component which monitors the master servers for changes, and once a change is found, pushes the change to the slave servers, leaving both the master and the slave servers free from the complexity of maintaining sync.
In certain master-slave architectures, the data is stored in a central location (like a SAN), and read from that central location with the aid of a cluster file system. This eliminates the need to perform log based replication, but comes with the additional complexity involved in maintaining a single data store.
An interface between the database servers and user applications, often referred to as database middleware, have been in use for some time to overcome the limits of conventional client/server database systems. One such limit often encountered in a conventional database system is that of the overhead involved in executing a simple read only query. For every query performed by the application, the database server has to process the structured database query, compile it into a form that can be understood by the database storage engine, physically fetch a large block of data from the disk systems where it resides for the sake of consistency and persistence, filter the large block of data, and make sure that only the part that is specified in the query is sent back to the application. This process is complicated, and highly repetitive, and becomes a considerable overhead for applications that access the database system intensively.
To overcome this overhead, some more sophisticated database systems implement a query caching system, either within the client application or within the database servers or both. But to implement query caching in a more basic database system, modifications would be required either in the application or server, or software additions would be required to either the client system, or the database server, or both, to be able to identify the data being requested, and then cache the same. Closed source applications that do not support code editing not even capable of being modified to implement query caching. In addition, even sophisticated database systems which do implement query caching often do so in a sub-optimal manner.
For example, conventional database systems which do implement query caching lack a configuration management tool or transparent layer that, without modifying low level application code or server configuration, permits the application developer or database administrator to specify rules or metadata of queries that are to be cached.
Nor do conventional database systems include a mechanism for clearing or deleting specific cache entries, or groups of cache entries, from the query caching system on demand, unless specifically written for the application, manually by the developers.
Nor do all conventional query caching systems support persistency of cached data. On instances when the cache crashes or the system restarts, since the entire cache system in maintained in volatile memory (RAM) to ensure good performance, when such a system recovers from a crash or a restart, the data cache is cleared. This results in what is called the cold start problem. On clearing the caching system, no data resides in it, so all database queries are served by the database system, which incurs considerable compute time cost. In case of read intensive applications, this becomes a bottleneck which many times results in further database system crashes.
Applicants have identified a need for a query caching system that acts as database middleware and addresses the drawbacks of existing query caching systems. Such a system will considerably reduce the overhead involved in executing queries on database servers thereby reducing the possibility of database servers from becoming bottlenecks in user applications.