A common architecture for computer applications is the client-server architecture. Client-server applications are computer systems where functionality of the application is divided across the server and the clients. For example, the client may provide a user interface and the server may provide access to shared resources. Typically the clients and the server execute as separate process. The clients request the server process to perform actions on their behalf i.e. the clients access shared resources via the server. The server manages the shared resources, and these resources may be termed managed data. To facilitate the execution of actions (on behalf of clients) the server needs to maintain control data to manage the execution of those actions. Example of control data is information to control concurrency, permissions, and access to the managed data etc. Typically, control data is transient and is reinitialized at system start; however parts of control data can be persistent. In summary, the data manipulated by a server in a client-server system may be divided into two parts: managed data, and control data.
A common example of a server used in client-server architectures is a database management system (DBMS). A database is a collection of data items stored in a computer—these data items constitute the managed data in a database management system (DBMS) setting. Multiple users may concurrently access these (managed) data items via clients. The actions that are run on behalf of the clients are called transactions. Transactions may read from the database, write (insert, delete, or update) to the database or both, thus transactions may be made up of many read and write operations. Transactions can not only cause the modification of data items, but also the modification of control data that the DBMS maintains internally to control execution and provide access to the underlying data items. We will frequently provide examples from DBMS. However, it should be noted that the invention presented here has wide applicability and DBMS is only one example application.
Those skilled in the art will recognize that atomicity is a desired behavior of any mission critical client-server system. Atomicity refers to the property that any client request is either fully executed or not executed at all, in other words, either all effects of an action that the client requested are visible to other clients or none of the effects is visible. One example of a client-server system where atomicity is highly desired is a DBMS. Either all effects of a transaction should be visible to other transactions or none of the effects should be visible (this is part of ACID (Atomicity, Concurrency, Isolation, and Durability) properties of transactions). Client requests have intentional direct effect on managed data. However, control data is changed indirectly. It is changed by the server process running on behalf of the client. Typically the property of atomicity is associated with managed data, and not control data.
In the art, the techniques to implement atomicity for managed data via logging and recovery are well understood. Write Ahead Logging (WAL) is a well-known example of logging. In this scheme, log records are created to track the changes made to the managed data. The log records include the old copy of managed data as well as the new copy. They also record the beginning and end of client actions. WAL guarantees that log records are persisted to a non-volatile storage medium, such as a disk, prior to persisting the actual managed data. Thus, in case of any failure, the server uses the log records that have been persisted to determine whether a given client action was partially completed or fully completed. The effect of partially completed client action is undone by using the old copy of managed data saved in log records to roll back the state of the managed data to the state it had prior to starting the client action. Similarly, the new copy of managed data saved in log records is used to roll forward the state of the managed data to reflect the changes made by fully completed client actions. In this manner, the server guarantees atomicity of client actions on managed data even in the presence of failures. Rollback and roll-forward together help achieve atomicity in a system.
Just as atomicity is a correctness condition for managed data, consistency is a correctness condition for control data. We define a consistent state for the control data as any state in which the control data is not being modified by a client action. Note that at the end of rollback, the control data is, by definition, in a consistent state. When a client action is performed, it may lead to changes in control data, including, but not limited to, access control information, managed data metadata, concurrency control information, etc. In the presence of client failures, the control data needs to be in a consistent state before the server can successfully rollback the effects of the client actions that were being performed on the managed data.
We define recovery as the process of bringing both the control and managed data to a correct state. That is, recovery involves bringing the control data to a consistent state and maintaining atomicity for the managed data.
Traditionally, in client-server systems such as a DBMS, client requests are executed in a server process separate from the client process itself, and client processes connect to the server via inter-process communication mechanisms such as messages. These configurations are called indirect connections. Such configurations are highly resilient to errors in the clients. Specifically, if a client process dies or exits, the server detects the failure through the lack of communication with the client, cleans up any control data that may have been modified on behalf of the failed client process to reach a consistent state for the control data, and rolls back all incomplete actions being executed on behalf of the failed client process. The crucial benefit of traditional indirect connections is that the failure of a client process cannot interrupt changes that the server makes to its control data. Thus the failure of a client process cannot result in partially modified control data. For example, a typical change in control data can be an insertion into a linked list. With indirect connections, the act of modifying the linked list will not halt in the middle of the change; rather, it will halt before or after insertion, when the control data is in a consistent state. That the act of modifying the linked list will not halt midstream is because the server will check if the client is dead only at these discrete points; if the client is dead, the server can take further action. In essence, the server process is free to act upon the client failure when it is convenient for it, i.e., when the control data is in a consistent state.
While inter-process communication between client processes and a server process insulates the server process from client failures, it does add a significant performance overhead to each client request. This overhead is undesirable in high performance environments, and is particularly unacceptable for an in-memory DBMS. An in-memory DBMS is a state-of-the-art DBMS that has been designed to fully exploit 64-bit processors, and inexpensive and plentiful main memory to deliver very high performance. In such a system, all data items of a database are in main memory at run time, instead of being on non-volatile memory such as disk.
A common solution for overcoming the overhead of inter-process communication is for some server functionality to be packaged as an executable library that can be linked with the client application, and for the client application and for that part of the server to execute in a single process. We call this configuration a direct connection, and we call the combined client application and server library a direct connection client server system. Since there is typically a multitude of clients in a client-server application, it is typical to maintain control data and some or all of managed data in a shared memory segment.
In such an environment, the failure of a client process can interrupt the execution of a client action. The consequences of such a client failure include both the managed data and control data potentially in inconsistent states. For example, consider again a client modifying a linked list in the control data. In the direct connection model, the client may die or exit in the middle of making the change to the linked list. Thus, the linked list may be left in a circular state, may be truncated, or may be left in some other inconsistent state. If the server tries to rollback from an inconsistent state it may get stuck in an infinite loop, may leak memory, or may crash. Thus, the fact that the control data might be in a consistent state creates a problem for the server attempting to roll back changes made to the managed data by the failed client.
The problem of dealing with changes to control data by directly connected clients is well recognized in the literature. One solution to this problem is to declare all client connections to the server invalid whenever an unexpected failure occurs in a directly connected client process while it is in the middle of modifying control data. Sometimes critical sections are used to declare such regions that modify control data. These regions may vary in granularity; a simple application of this technique is to declare the whole server executable library a critical section. The server is not capable of bringing partially modified control data to a consistent state, and this scheme forces all clients to reconnect when any client fails when inside the server library. This also makes the system go through its recovery process (which is used to guarantee atomicity in managed data, as explained earlier) and reinitialize the control data. This solution, though effective, is not practical. Consider a large SMP machine with 64 processors, and perhaps 50 client connections to the database. Any single unexpected exit will cause all client connections to be severed. This is a heavy hammer, especially in mission critical applications, which require the same stability guarantees that indirect connections provide, but desire the speed advantages of direct connection client-server systems like in-memory DBMSs.
There have been other proposals to address these issues, which have proven to be partial solutions. Molesky and Ramamritham have proposed hardware-based cache coherency models that can ensure control-structure coherency even in the presence of node failures. They define a node as a process/memory pair executing a given transaction. But to implement their scheme, special hardware instructions are required to lock cache lines, and a special cache line structure is needed to tag it with the correct identifier. These hardware properties are then used to implement a recovery scheme that does not involve shutting down all connections to the database. Even with advances in process architecture, the proposed requirements have not been generally met in modem processors. Thus, this scheme is not practical to implement today. Other schemes have been proposed that rely on message passing between different processes. However, they have the same performance shortcomings of indirect connections.
Another scheme that can be modified to handle the issue of invalidation is the checkpoint protocol proposed by Neves et al. The chief shortcoming of this protocol is the assumption of entry consistent shared memory system. In such a model, all accesses of shared memory are assumed to be protected and there are no dependencies between the various accesses. This model is impractical for a complex system such as a DBMS. In a DBMS, multiple segments of the shared memory may be accessed and updated in a dependent fashion as a single unit. Yet another set of schemes have been proposed by Ganesh et al. to reduce the time taken to recover from a failed client. But these schemes fail to achieve consistency in control data.
Thus, there is a need to improve techniques to achieve control data consistency in directly connected client models. An example of such a system is where directly connected client processes execute in the same process as a DBMS and in particular when the DBMS is an in-memory DBMS. These techniques should be widely portable to all hardware platforms—i.e., the techniques should be hardware-neutral—and practical, should achieve control data consistency without sacrificing performance or concurrency, and without large storage requirements.
We have seen earlier that logging techniques are used to track changes to managed data to guarantee the atomicity of client actions. Typically, these techniques are not used to track changes to control data. Control data is mostly transient and exists to assist the server in managing execution of actions performed on behalf of the clients. Additionally, traditional indirect connections insulate the system from having to deal with partially modified control data; and therefore achieving consistency in control data is not an issue for these traditional systems. However, for the directly connected clients, it is paramount to reach a consistent state for the managed data otherwise all execution has to end.
One could propose to log all changes to the control data to persistent storage, similar to the scheme that was described earlier for managed data. This will require considerably more non-volatile storage given the volume of log that would be generated. More importantly, such a system will be much slower because of frequent access to slow non-volatile storage, and the system will be disk-bound. Thus this scheme is not practical.