1. Field of the Invention
The present invention relates generally to the field of computer software and client/server applications. In particular, it relates to operating system software running in a distributed computing environment for managing connections between a client and a server.
2. Discussion of Related Art
In multi-threaded operating systems, threads are generally contained and run within processes. Threads within a process can share resources and memory allocated to the process. With present systems, when a thread executing within a process crashes, the entire process is terminated. That is, all the connection threads within that process are terminated, resulting in an abrupt and damaging closing of connections, typically with end users. This problem occurs even if there is no association or relationship between the process that crashed and the other threads in the process.
Although results of a thread crashing can vary, the results are nearly always undesirable and can cause significant damage to an end user such as loss of data. For example, in a PC running under the Windows(copyright) operating environment or in a Macintosh(copyright) computer running under the Macintosh Operating System, a thread crashing typically causes the entire operating system to shut down, or if the thread crashes on a network server, brings down the entire network. These are undesirable consequences. The fact that an entire multi-threaded process can be terminated from a single thread (out of potentially hundreds of threads in the process) crashing for reasons completely unrelated to the other threads makes the operating system under which the process is running brittle and less stable then would be otherwise desired. Unix based systems handle thread crashes more smoothly. Although the process running the crashing thread is terminated, it does not typically bring down the entire operating system. The system keeps running although all the connections implemented by threads in that process are still abruptly terminated.
A thread crashes when it receives a critical signal from the operating system. The operating system is told to send a critical signal typically by the computer system hardware. Specific signals have specific meanings in the system. Some are ignored by the process and others are caught and handled by the process. When a signal is ignored by a process that contains the thread that caused the signal, the process dies and a core file, a snapshot of the process at the time of the crash (described below) is made by the operating system. When a signal is caught by the process, the process can handle the signal or, if the signal is a critical signal, the process will shut down. In some cases, the process ignores critical signals because the signals indicates that the internal state of the process has been corrupted. The operating system realizes that the process ignored the signal and shuts down the process. The operating system creates a core file which contains the state of all the threads in the process when the process was shut down. It is essentially a snapshot of the process that can be examined to determine what the crashing thread attempted to do that caused the thread to crash.
Therefore, it would be desirable to have processes that can handle critical signals directed to a thread in the process and not have the entire process, which possibly has other threads running in it, terminate. It would be desirable to allow the other threads in the process to continue functioning and have only the thread that crashed, its resources cleaned up in an orderly manner terminate, and the end user informed that the connection has been closed because of a particular error.
To achieve the foregoing, and in accordance with the purpose of the present invention, methods, apparatus, and computer readable medium are disclosed that allow threads in a multi-threaded process to continue executing when a single thread within the process receives a critical signal and crashes. In one aspect of the present invention, a method is provided in which a critical signal directed to a particular multi-threaded process resulting from the execution of a particular thread in the process is handled by a critical signal thread. The critical signal thread is invoked when the process receives a critical signal and prevents the entire process from shutting down because of one bad thread in the process. The critical signal thread terminates the resources and connections associated with the offending thread. It does this without effecting the performance of other non-offending threads in the process thereby preventing the termination of other connections in the process because of illegal or invalid operations of a single thread.
In one embodiment the critical signal thread is initialized by registering particular signals thereby enabling the critical signal thread to detect those signals. In yet another embodiment, a module or function within the critical signal thread called the critical signal handler is invoked to handle registered signals. In yet another embodiment, the critical signal thread reads a signal queue maintained by the operating system in order to process an incoming critical signal.
In another aspect of the present invention, a method of terminating resources and connections associated with an offending thread is described. The critical signal handler closes files opened only by the offending thread and unlocks or marks as unlocked any files locked by the offending thread. In one embodiment an informational message is sent to the client informing the client that the connection has been terminated. In yet another embodiment all references to the offending thread and all memory associated with the offending thread are cleared or deleted. In yet another embodiment an input polling thread contained within the process is instructed to discontinue polling for input events directed to the offending thread. In yet another embodiment a core file of the process is made at the time it receives the critical signal even though the entire process is not shut down.
In another aspect of the present invention, a method of maintaining a multi-threaded process when a thread within the process crashes is described. A data space for each thread in the process is organized such that each data space is substantially independent from the other data spaces. Signals from the operating system directed to an offending thread are processed by executing a crash thread in the process. The offending thread is terminated thereby releasing system resources and clearing connections associated with the offending thread. The method allows the other threads in the multi-threaded process to continue functioning thereby preventing termination of the entire process.
In one embodiment the data space for each thread includes a plurality of private structures internal to the thread that allow the thread to maintain a reduced amount of stale data. In yet another embodiment the multi-threaded process registers critical signals when the process is invoked thereby allowing detection of critical signals by the process. In yet another embodiment the offending thread is terminated by deleting all references in memory to the offending thread, closing all files associated with the offending thread, and terminating all connections associated with the offending thread.
In another aspect of the present invention a computer system having a multi-threaded process capable of executing active connection threads where the system is arranged such that when a critical signal is generated for an offending thread, other threads continue operating within the process is described. A critical signal thread detects critical signals generated by the operating system and handles termination operations for an offending thread. This is done without requiring that the entire process be terminated in response to the detected critical signal. A signal handler contained in the critical signal thread deletes references to the offending thread in response to the critical signal and causes operations between the offending thread and files in the computer system to discontinue. This is done while allowing other active connection threads within the process to continue operating.
In one embodiment the system includes a signal register for registering critical signals thereby enabling the critical signal thread to detect and process critical signals. In yet another embodiment the system includes a memory shared by multi-threaded process in the system which contains information on each thread in the plurality of threads. In yet another embodiment the system includes an input polling thread in the process which is instructed by the critical signal thread to discontinue polling for input events directed to the offending thread.