The present invention generally relates to debugging software programs and, more specifically, to techniques for debugging database systems.
In a database system, an area of system memory is allocated and one or more processes are started to execute one or more transactions. The database server communicates with connected user processes and performs tasks on behalf of the user. These tasks typically include the execution of transactions. The combination of the allocated system memory and the processes executing transactions is commonly termed a database xe2x80x9cserverxe2x80x9d or xe2x80x9cinstancexe2x80x9d.
Like most software systems, a database server has complicated shared memory structures. A shared memory structure contains data and control information for a portion of a database system. Because of software, hardware, or firmware bugs that may exist in a complex database system, shared memory structures may become logically incorrect. When structures become logically incorrect, the database is likely to fail. Database failure is typically discovered in the following ways: by checking consistency of structures; by verifying certain assumptions; or by running into corrupted pointers. Attempting to process corrupted pointers will lead to a xe2x80x9ccrash,xe2x80x9d where normal database operation is no longer possible.
A major responsibility of the database administrator is to be prepared for the possibility of hardware, software, network, process, or system failure. When shared structures are presumed to be corrupted, the best course of action for a database administrator is to cease further processing of the database. If a failure occurs such that the operation of a database system is affected, the administrator must usually recover the database and return the database to normal operations as quickly as possible. Recovery should protect the database and associated users from unnecessary problems and avoid or reduce the possibility of having to duplicate work manually.
Recovery processes vary depending on the type of failure that occurred, the structures affected, and the type of recovery that is performed. If no files are lost or damaged, recovery may amount to no more than rebooting the database system. On the other hand, if data has been lost, recovery requires additional steps in order to put the database back into normal working order.
Once the database is recovered or rebooted, the immediate problem is quickly resolved, but because the root cause is still undetermined and therefore unresolved, the error condition may resurface, potentially causing several additional outages. Therefore, it is still important to diagnose the state of the structures and data surrounding the database failure. Such a diagnosis may provide valuable information that can reduce the chance of failure in the future. As a practical matter, diagnosing the failure may lead to determining which vendor""s hardware or software is responsible for the database failure. Such information is valuable for a vendor""s peace of mind, if nothing else. Thus, competing with the goal of recovering the database as quickly as possible, is the goal of determining why the database system failed in the first place.
Unfortunately, even with traditional techniques of diagnosing a database failure, the system administrator is usually unable to obtain a sufficient amount of clues to determine why the failure happened. A deliberate and thorough diagnosis of the failure may require an unacceptable amount of database downtime. For example, any amount of downtime over 30 minutes may be extremely costly for a database that is associated with a highly active web site. Too much downtime may have unduly expensive business ramifications, such as lost revenue and damage to the reputation of the web site owner.
Another problem with traditional debugging techniques is that they can be intrusive. For example, a database system that supports the Structured Query Language (SQL) may be debugged by compiling SQL statements and running against the database. The act of compiling and executing the SQL statements changes the state of the database system. Thus, the mere act of diagnosing the problem can easily make the problem worse because diagnosis may involve altering the state of the database. Diagnosing the problem typically involves using debugging software, which calls for peeking and poking into data structures within the complex memory structures of the database systems. Although the data structures are best left untouched upon a failure, diagnosing the failure may involve working directly on the same data structures from which data is to be obtained. Nevertheless, it is important to preserve the original data and not change the data from its state at time of failure. A customer of the database may take issue to changing the database as such changing may jeopardize or even destabilize their database system.
Effective diagnosis, however, requires getting as much information as possible out of the data structures. It may be useful here to refer to Heisenberg""s uncertainty principle, which effectively states that the closer an object is analyzed, the more the object materially changes because the mere act of analyzing is intrusive. Applying this principle to the act of diagnosing a database failure, a typical debugging process is naturally intrusive. Thus, it is difficult to be non-intrusive on a database and at the same time obtain a sufficient amount of meaningful data for debugging.
Traditional debugging techniques involve formatting certain parts of the database system and displaying this formatted portion in a human-readable form. This human-readable form can be set aside for later analysis, for example, after the database has been recovered or is no longer down. The entire memory of the database server is not dumped because an average database server is very large, typically between about 200 megabytes and about 100 gigabytes of unformatted binary and data. On the portion of the database that is formatted, an educated guess is made of the key data structures that are potential causes of the problem.
Unfortunately, such a debugging technique provides diagnosis only to the database""s end-memory state, which is the state after the database has been shutdown. Because the end-memory state is being analyzed separately from the database, the programmer performing the debugging does not have access to the real database and some of the database""s persistent structures. Some of these persistent structures could be on disk or, in a multiple node system, on other nodes. For example, in a parallel server configuration, the persistent structures needed for debugging could reside on other servers. Thus, the technique of separately debugging portions of the database prevents the programmer from using the data that can only be obtained from the database itself.
Further, where debug operations are performed on the database while the database is down, multiple programmers cannot each privately diagnose the failure. Rather, the key data structures are typically diagnosed by having one programmer in front of a console inputting debug commands, while other programmers gather around issuing advice. Multiple programmers individually debugging the database is unadvisable using existing debugging techniques because the act of inputting debug commands is intrusive, as mentioned above. Each programmer""s work would interfere with the concurrent debugging progress of fellow programmers.
For the foregoing reasons, what is needed is a method of debugging a software program, such as a database system, that is non-intrusive, yet allows for a comprehensive assessment of a failure.
A method and apparatus for debugging a software program is provided that is non-intrusive and allows multiple persons to debug concurrently. In one embodiment, a method comprises preserving a memory state of a portion of a software program, such as a database system. A debug command is received that, when executed, would normally cause modification to targeted data in the preserved portion of the software program. The command is executed by making a copy of the targeted data in the preserved portion of the software program. The copy is modified to generate a modified copy of the targeted data without modifying the data that is in the preserved portion of the software program. In subsequent accesses, the user that issued the debug command accesses the modified copy whenever the user would have otherwise accessed the corresponding preserved portion.
In another embodiment, a computer-readable medium is described that stores a preserved portion of a software program and a modified page of memory of the preserved portion. The modified page corresponds to a page in the preserved portion. The modified page has a modification that normally would have been made to the corresponding page in the preserved portion. The preserved portion is unaltered.
Advantageously, the present invention allows multiple persons to concurrently debug a problem of a software program. Each person preferably has their own private view, which consists of (1) copied portions of the software program that reflect modifications made by that person, and (2) the portions of the preserved software program that the person has not modified. Providing a private view to each person allows each person to debug privately, independently, and concurrently with others. Each private view may be extensively explored and modified without affecting the memory state of the software program that existed at the time the software program was shutdown. Accordingly, at any time, each private view may be refreshed to the state of the software program that existed at the time of shutdown. Faster diagnosis of the problem may therefore be accomplished because a debugger does not have to peek cautiously and slowly into the inter-workings of the software program. Thus, where downtime of a software program must be kept to a minimum, the present invention provides techniques for performing quick and comprehensive diagnostics of the software.