This invention relates to client-server computer systems. More particularly, this invention relates to systems and methods that enable database applications running on the client to persist across server crashes.
Computer systems occasionally crash. A xe2x80x9csystem crashxe2x80x9d is an event in which the computer quits operating the way it is supposed to operate. Common causes of system crashes include power outage, application operating error, and computer goblins (i.e., unknown and often unexplained malfunctions that tend to plague even the best-devised systems and applications). System crashes are unpredictable, and hence, essentially impossible to anticipate and prevent.
A system crash is at the very least annoying, and may result in serious or irreparable damage. For standalone computers or client workstations, a local system crash typically results in loss of work product since the last save interval. The user is inconvenienced by having to reboot the computer and redo the lost work. For servers and larger computer systems, a system crash can have a devastating impact on many users, including both company employees as well as its customers.
Being unable to prevent system crashes, computer system designers attempt to limit the effect of system crashes. The field of study concerning how computers recover from system crashes is known as xe2x80x9crecovery.xe2x80x9d Recovery from system crashes has been the subject of much research and development.
Current database systems support fault-tolerance and high availability by recovering quickly from system failures. In general, the goal of redo recovery is to return the computer system after a crash to a previous and presumed correct state in which the computer system was operating immediately prior to the crash. Then, transactions whose continuations are impossible can be aborted.
Much of the recovery research focuses on database recovery for database computer systems, such as network database servers or mainframe database systems. Imagine the problems caused when a large database system having many clients crashes in the midst of many simultaneous operations involving the retrieval, update, and storage of data records. Database system designers attempt to design database recovery techniques that minimize the amount of data lost in a system crash, minimize the amount of work needed following the crash to recover to the pre-crash operating state, and minimize the performance impact of recovery on the database system during normal operation.
While database recovery techniques are helpful for recovering data, the techniques offer no help in recovering applications that are interacting with the database at the time of failure. Currently, such applications either fail, resulting in an application outage, or are forced to cope with database failures assuming they survive the database crash. The former compromises application availability and can increase operational complexity. The later either severely restricts application flexibility or increases its complexity.
When an application fails because of a database system crash, organizations responsible for the application need to quickly bring the application back on line. In the enterprise-computing world, time is quite literally money. Database recovery ensures that the database state is consistent. However, an application retaining state across database transactions can have consistency requirements that are not captured at the database transaction boundary. Furthermore, parts of the application state may be lost during a crash. Restoring and continuing application execution is all too frequently a very complex and time-consuming operational problem.
In some system configurations, an application can survive a database system crash. For example, when the application executes on a client machine while the database is on a separate server. This permits the application to include logic to deal with database crashes and hence avoid an application outage. However, handling errors or exceptions is a very difficult part of getting applications right. Dealing with database system failures at the application level is tedious and error-prone, even when the application itself stays alive.
There has been some work in this area. One technique exploits logging and recovery techniques to enable applications to be recoverable. See, e.g., Lomet, D. Application recovery using generalized redo recovery. Int""l. Conference on Data Engineering, Orlando, Fla. (February, 1998); and Lomet, D. and Tuttle, M. Redo recovery from system crashes. VLDB Conference, Zurich, Switzerland (September 1995) 457-468. The focus of this work has been to minimize the impact of providing recovery on the normal operation of the system. In practice, this means minimizing the amount of logging and application checkpointing required. Sometimes, it means making the application an object that can be managed by the database recovery manager.
Other prior work on application fault-tolerance in distributed systems is based on some form of application xe2x80x9cinstallation pointsxe2x80x9d and/or xe2x80x9cmessage loggingxe2x80x9d. The prior work can be categorized into the following three approaches, all of which incur high normal operation and/or recovery costs: (1) fault-tolerant process pairs, (2) distributed state tracking, and (3) persistent queues.
Another client-server system directed to application recovery is described in U.S. patent Ser. No. 09/033,511, entitled xe2x80x9cClient-Server Computer System With Application Recovery of Server Applications and Client Applicationsxe2x80x9d, which was filed Mar. 2, 1998 in the names of David B. Lomet (an inventor in this invention) and Gerhard Weikum. This application is assigned to Microsoft Corporation.
Despite these efforts, there remains a need to improve application recovery techniques in client-server database systems. Particularly, there is a need to provide application recovery at modest system implementation cost that avoids modification to the application itself.
This invention concerns a client-server database system that enables persistent client-server database sessions, without modification to a client-side application, database system, or native client-server access mechanisms (called drivers). The client-server database system preserves sessions across a server crash without the client-side application being aware of the outage, thereby making recovery transparent to the application.
In one implementation, the client-server database system has a client computer and a database server computer. The database server computer runs a database server program that handles client queries for data in one or more database tables stored in a stable storage.
The client has a database application to formulate the client queries for the data kept in the tables. One or more client-side database drivers facilitate communication between the database application and the database server program.
The client is further implemented with a driver manager to facilitate communication between the database drivers and the database application. The driver manager wraps the native drivers, intercepting queries passed from the database application to the database drivers and responses returned from the drivers to the application. The driver manager modifies the queries to form modified queries that direct the database server to render result sets produced from processing the queries persistent at the database server.
If the query will result in creation of a result set, for example, the driver manager alters the query to direct the database server to create a result set table in the stable storage and fill the table with the result set obtained from processing the query. In this manner, the result set will persist across a server crash. The driver manager continues to convert statements from the database application for operation on the result set table and reconvert responses from the database server prior to returning them to the database application. The driver manager further tracks application statements and a current result set table location in a log maintained at the client. Thus, the database application is unaware of the actions being taken to make the session persistent.
Upon recovery following a server crash, the driver manager directs the drivers to reestablish a connection with the database server. The driver manager then finds the persistent result set table and using data from the log, returns to the same operation on the result set table just prior to the crash. The driver manager reassociates the application context to the new database session without the database application even knowing that a failure occurred.