This invention relates to methods of analysis of defects (debugging) in an operating system for computers, which is especially useful for debugging a multitasking operating system which supports multiple processes concurrently and lacks the support for registering a call back function through the operating system""s exception handler.
Debugging an application is an integral part of its development. A considerable amount of resources are spent testing an application before it is considered ready for the market. During testing, it is common to have numerous test cases which are run against the product in order to flush out as many bugs as possible. If a test case fails due to an application exception, it is often difficult to recreate the problem inside of a debugger. In order to catch an exception in a debugger, the debugger must be attached to the running process and run along with it in memory. Also, if the application has multiple processes a debugger must be in control of each of the processes in case one of them encounters an exception. When running a large number of test cases, this quickly becomes a problem.
Unfortunately, program exceptions still sometimes occur on a customer""s system. Because of this, remote debugging is also an important part of the software industry. Program exceptions can occur for a number of reasons including operating system bugs, hardware failures, and program defects.
When a program receives an exception at a customer site, it is often on a production system and getting the problem diagnosed and resolved expeditiously is very important. One of the main inhibitors of quick resolution is the lack of accurate information. Stopping a process in the exact state of the exception is both very useful and can be done for multiple processes.
Once a process has been stopped, then information such as the virtual memory (including the stack and heap), register dumps and libraries for the process can be dumped into files which can then be packaged up and sent back to the vendor or the product.
Windows(copyright) has a feature which allows a program to register itself for use with the operating system exception handler. This program will then get called in the event that an exception occurs. The program can then choose to attach a debugger or other program to the process which caused the exception. For operating systems which do not support the registration of a call back function with the operating system this can be done by using other methods. In the Sun Microsystem Solaris Operating Environment (an operating system), a set of signals or a set of faults can be traced which has the effect of stopping the process if it encounters one of the signals or faults in the set. This can be done for any number of processes.
When the operating system detects an exception, it stops the process in the exact state that caused the exception. A monitor program such as comprised in the invention described below can then detect that the process has stopped and dump relevant information as well as notify the user that a process has stopped. If the process which has stopped due to an exception is on a development system a debugger can be attached. If the process resides on a customer system, the monitor program can dump the process""s information which can then be sent to a developer for further study.
By using the operating system""s ability to stop a process in the event that it hits an exception, a set of processes can be xe2x80x9cwatchedxe2x80x9d or monitored by a single external program. In the event that one of the monitored processes stops due to an exception, the monitor program can then take action. The resulting action of the monitor program is configurable and depends on the specific need of the user.
If the user of the monitor program is a developer, then dumping information for the trapped process may not be necessary. Instead, the monitor program can be run such that it does not dump any information but only notifies the user that a process has stopped. The developer can then attach a debugger to the process and gather information about the exception through the debugger.
Debugging a exception remotely such as at a customer""s site is often a difficult undertaking. Typically, an application will install a signal handler for each process which can then dump out relevant information such as a stack trace, register dumps, etc. This type of signal handler gets called when certain signals are encountered, usually indicating a program exception. This information is then sent back to the vendor where debugging is attempted. One of the problems with this method is that a signal handler is called which changes the process from it""s original state where it received the exception making some of the information stale. Another problem is that sometimes the stack trace for a process is corrupted and the process is not able to dump relevant information for itself. Since this form of debugging is remote, attaching a debugger is often not possible.
In the case of remote debugging, the monitoring process can initiate dumping of the process""s virtual memory, call stack, current registers, etc. This information is dumped into files created under a directory structure which can then be packaged up and sent back to the vendor. The information can include but is not limited to: virtual memory, register dumps, call stack, reason for dumping, and libraries which were loaded during the exception.
Currently debuggers are designed to control one process and its children. Controlling a related set of processes entails attaching multiple debuggers, one per process or attaching to the first process and allowing the debugger to follow each child as needed. Even in the case where the debugger follows each child, one debugger is needed for each process which is expensive on the system resources. This method can change the environment enough to prevent the occurrence of the exception of interest.
If the system is remote and the starting of a debugger is not an option, the application has to rely on the signal handlers to dump as much information as possible. As previously mentioned, this method can be unreliable and the information for the process can be somewhat stale.
In contrast to the UNIX operating environment, in Windows an operating system exception handler is provided with which a debugging utility can register itself with in order that the debugging utility can be notified when an application running under the operating system encounters an exception. An exception can be due to a hardware fault or an illegal machine instruction or invalid memory access which can cause an interruption to the operation of the application program.
U.S. Pat. No. 5,526,485 assigned to Microsoft Corporation appears to disclose a debugging system in which the monitor program registers itself with the operating system, to be called by the operating system in response to exceptions generated by an application program running on the operating system. When called in this manner, the monitor program first checks to see if a debugging program is already running. If one is, the monitor program returns and the operating system calls any remaining registered programs, such as the running debugging program. If there are no running debugging programs, the monitor program loads and starts a debugging program to debug the previously loaded application program. This is distinguishable from the present invention which does not require the monitor program to register itself with the operating system, and is not called by the operating system in response to an exception.
Presently, for the UNIX operating system and other UNIX-like or UNIX derived operating systems like SOLARIS, or AIX, HP-UX, IRIX, etc. there is no support for the registering of a monitor program utility with the operating system. Accordingly if an exception occurs in the running of an application or process in these operating systems only that application or process is notified of the exception. No notification is made to other processes or applications by the operating system itself (except in the case of an attached debugger). Because of this limitation debugging multiple processes or applications running under these operating systems can be difficult. In the case of a single or limited number of processes it is possible to provide for debugging of the processes when an exception occurs if a debugger is attached to each process. xe2x80x9cAttachedxe2x80x9d in this context is defined as using operating system provided api""s (application programming interfaces) to gain control of the process so that when an exception occurs in the process, the operating system sends a notification to the debugger that an exception has occurred (ie. that a process to which the debugger is attached has changed states); the debugger then can inform the user of the debugger that an exception has occurred and await further instructions from the user.
As may be appreciated when a larger number of processes are running simultaneously the attachment of one debugger to each process results in an unwieldy number of debuggers and inefficient and time consuming procedure for the user. The user would either have to attach a debugger to each process - - - time consuming, or set the debugger used to spawn a new debugger for each process . . . 400 debuggers for 400 processes.
In order to overcome these difficulties the following invention was developed. In one embodiment of the invention only a single monitor program is required to detect whether any process has stopped due to an exception.
One aspect of the invention provides a method of investigating the operation of processes of an application program running on a multitasking operating system of a computer system to determine if any of the processes have stopped for a predetermined exception incident, including: identifying to the operating system a plurality of predetermined exceptions to be investigated; instructing the operating system to identify a process that has stopped when it encountered one of the predetermined exception incidents; scanning the computer system periodically for stopped processes; determining whether a stopped processes has been identified as having encountered a predetermined exception incident; and performing a predetermined action if the process has encountered a predetermined exception incident. The scanning is preferably resumed after sending the notification. An application may be provided if necessary to allow the stopping of a process from running if it encounters an exception. The operating system will not leave the process in a stopped state if the process encounters an exception which is not one of the predetermined exceptions.
The predetermined action may preferably include any of the following:
Notifying a user of the exception incident,
Notifying another program of the exception incident,
Starting another program,
Dumping information relating to the exception incident encounter to another program or file; or continue the process. The other program may preferably be a debugging program.
Another aspect of the invention provides a program product for operation on a computer system for investigating the operation of processes of an application program running on a multitasking operating system of the computer system to determine if any of the processes have stopped for a predetermined exception incident, including: a storage medium, a program routine stored on the storage medium for identifying a plurality of predetermined exceptions to be investigated; a program routine for instructing the operating system to identify a process that has stopped when it encountered one of the predetermined exception incidents, a program routine for scanning the computer system periodically for stopped processes; a program routine for determining whether a stopped processes has been identified as having encountered a predetermined exception incident; and a program routine for performing a predetermined action if the process has encountered a predetermined exception incident.
The routine for scanning may be adapted to cause resumption of scanning after sending the notification.
If the application program requires, a program routine can be provided for instructing the application program or operating system to stop a process from running if it encounters an exception.
The program product may include a routine for instructing the operating system to continue a process that has stopped without encountering any of the predetermined exceptions.
The predetermined actions preferably includes any of the following:
Notifying a user of the exception incident,
Notifying another program of the exception incident,
Starting another program,
Dumping information relating to the exception incident encounter to another program such as a debugging program or to a file; or restarting the process.
Aspects of this invention includes a method and apparatus including software for debugging an application or applications designed to run on an operating system such as UNIX which does not support the registering of a call back function through the operating system exception handler.
One embodiment of the method of the invention includes scanning processes of an application that are running on the operating system in a computer system. Under the method of the invention, when a process encounters an exception incident stopping the process from executing in the normal manner intended, the occurrence of the exception is detected by the operating system which sends a notification to the process of the exception incident that the process may be able to respond to if it later resumes operation. The operating system checks an exception list containing predetermined exception incidents, and if the exception encountered by the stopped process is not on the list attempts to continue the process (in this situation the process may die if it is incapable of handling the exception condition, or, if it can handle the exception it may continue operating, and signal the exception or dump information as is its capability). If the exception is on the exception list the process will be left in a stopped condition with the exception signal pending. The processes on the operating system are scanned periodically or at user discretion to determine if a process is stopped. If a process is identified which has stopped due to an exception, the monitor program can then send a signal to a user, for instance, informing the user of the stopped process, or to another program to causing dumping of process information to permit debugging.
Another embodiment of a method of the invention advantageously takes advantage of operating system features to allow a set of processes to stop if they encounter an exception, and to scan the set of processes to identify any which have stopped due to an exception, and dump information related to processes which have stopped. Another aspect of the invention includes three programs to facilitate this method of debugging: (a) a status advisor program which can give the status of a set of processes and informs the user if the process will stop upon receiving an exception; (b) a suspension program which can cause a set of processes to stop in the event that they are subject to an exception; and (c) a monitor program which monitors these processes and reports when they have stopped. The monitor program also has the capability dumping contents of registers, virtual memory, and other pieces of information that would be useful for debugging the process which received an exception. This information is placed into files making it convenient for the user of the program to package the information and send it back to the vendor of the product.