Computers have become an integral tool used in a wide variety of different applications, such as in finance and commercial transactions, three-dimensional and real-time graphics, computer-aided design and manufacturing, healthcare, telecommunications, education, etc. Computers are finding new applications as their performance and speeds ever increase while costs decrease due to advances in hardware technology and rapid software development. Furthermore, a computer system's functionality and usefulness can be dramatically enhanced by coupling stand-alone computers together to form a computer network. In a computer network, users may readily exchange files, share information stored on a common database, pool resources, communicate via e-mail and even video teleconference.
One popular type of network setup is known as "client/server" computing. Basically, users perform tasks through their own dedicated desktop computer (i.e., the "client"). The desktop computer is networked to a larger, more powerful central computer (i.e., the "server"). The server acts as an intermediary between a group of clients and a database stored in a mass storage device. An assortment of network and database software enables communication between the various clients and the server. Hence, in a client/server arrangement, the data is easily maintained because it is stored in one location and maintained by the server; the data can be shared by a number of local or remote clients; the data is easily and quickly accessible; and clients may readily be added or removed.
Although client/server systems offer a great deal of flexibility and versatility, people are sometimes reluctant to use them because of their susceptibility to various types of failures. Furthermore, as computers take on more comprehensive and demanding tasks, the hardware and software become more complex and hence, the overall system becomes more prone to failures. A single server failure may detrimentally affect a large number of clients which are dependent on that particular server. In some mission critical applications, computer downtimes may have serious implications. For example, if a server were to fail in the middle of processing a financial application (e.g., payroll, securities, bank accounts, electronic money transfer, etc.), the consequences may be quite severe. Moreover, customer relations might be jeopardized (e.g., lost airline, car rental, or hotel reservations; delayed or mis-shipped orders; lost billing information; etc.).
Short of totally eliminating all failures which might disable the computer system, the goal is to minimize the amount of time required to bring the computer system back on-line after a failure occurs. In other words, it is important to recover from a failure as quickly as possible. It is also highly preferable to ensure that the failure does not cause any crucial data to become lost. One prior art mechanism for accomplishing both of these goals is known as "checkpointing." Basically, checkpointing periodically updates the data stored in the database with committed data stored in a volatile cache memory. By checkpointing, the database is kept relatively up to date so that when a system failure occurs, less recovery needs to be done.
FIG. 1 is a diagram describing a typical prior art computer system having checkpointing. The system may incorporate a number of clients 101-109 (e.g., personal computers, workstations, portable computers, minicomputers, terminals, etc.) which are serviced by one or more servers 110 and 111. Each of the clients interacts with server nodes 110 and 111 through various client programs, known as "processes, workers, threads, etc." Each of the server nodes 110 and 111 has its own dedicated main memory 113 and 114. Data from a database 116 stored in a large commonly shared storage mechanism, such as a disk array 112, is read into the main memories 113 and 114. Thereby, vast amounts of database data are accessible to either of the servers 110 and 111 for distribution to the various clients 101-109. As data is changed by the users, the modified data is stored back into the main memories 113 and 114. The data is then marked to indicate that they have been changed. Periodically, the marked data is checkpointed back to the database residing in disk array 112. This involves writing all marked data to its corresponding locations in disk array 112. In addition, all changes made after the most recent checkpoint are recorded into a separate log file 115.
When one of the server nodes 110 or 111 crashes, it loses all data contained in its respective main memory. However, most of the changes to the data have already been copied over to the database during the last checkpoint. The database is stored in the nonvolatile disk array 112. Hence, the data is not lost, even though power is unexpectedly termnated. Upon recovery, this data is read from the database and stored back into the main memory. Furthermore, the most recent changes to the data made since the last checkpoint are read back from the log file 115 and made to the main memory.
Although checkpointing addresses the main problems of recovery and data preservation, it nevertheless has several drawbacks. Namely, checkpointing is very costly to implement in terms of processing time. There is a severe performance penalty associated with performing checkpointing primarily because the marked records have to be written back to various disk locations in the database. These locations are usually scattered throughout different physical locations of the disk array 112. Often, thousands of transactions need to be updated during each checkpoint. And each of these transactions typically require its own separate input/output (I/O) operation to gain access to the desired location. Furthermore, if the page to which the data is to be written back is not currently in the main memory, the page must first be read off the disk; the data must then be merged with that page; and the page must then be written back to the disk. This sequence of events requires two synchronous I/O operations. Thus, it is not uncommon for checkpointing to take upwards of half an hour or more to complete. In the meantime, the server is prevented from performing other functions while checkpointing is being processed.
One approach to lightening the burden imposed by checkpointing is to reduce the amount of records being updated per each individual checkpoint. This approach has the added feature of improving the recovery time because data is being updated more frequently. However, the disadvantage to this approach is that it requires many more checkpoints be performed with shorter time intervals between each successive checkpoint.
The other approach is to save all the changes for one large checkpointing operation. This comprehensive checkpointing operation can be performed during off-peak hours. However, the disadvantage to this approach is that, in case of a failure, it takes much longer to recover from that failure. Rather than taking minutes to recover, it can take hours or even a full day to recover, depending on the size of the memory and when the memory was last checkpointed.
Thus, there is a need in the prior art for a checkpointing scheme that ideally: (1) has the capability to update data extremely quickly so as to minimize the time required to perform the actual checkpointing; (2) has a long time interval between successive checkpoints; and (3) also has a fast recovery time. The present invention provides an elegant solution that satisfies each of these goals. With the present invention, checkpointing can be accomplished much more quickly by implementing a few sequential I/O operations rather than thousands of random, scattered I/O operations. Because checkpointing can be performs quickly, more data can be updated at each checkpoint; hence, checkpointing can be performed less frequently. In addition, recovery time is much quicker with the present invention because data is read with a few sequential read operations performed in parallel from one or more dedicated checkpoint files back into the main memory. This is much more efficient than the traditional method of performing thousands of non-related I/O operations.