Fault-tolerant and high availability processing systems are known. These systems are used in applications requiring high reliability and low downtime. Exemplary applications for fault-tolerant or high availability systems include telecommunications applications, such as switching technology used in wire line and wireless switching applications. Fault-tolerant and high availability processing systems are typically required to have the ability to detect when a software process has died (abnormally stopped executing). Also, these systems must provide for some recovery from the death of a process.
Some existing fault-tolerant and high availability processing systems detect the death of a process by polling for process existence. Polling, however, has certain drawbacks including the time required to poll and the computation expense associated with polling. In certain real-time applications, such as real-time switching applications, the detection and recovery of process death must be on the order of milliseconds or less. Therefore, polling alone, particularly at the application level, is unacceptable in some cases.
Some operating systems provide protocols and facilities for monitoring processes. These operating system facilities often are more attractive than polling alone. UNIX and UNIX-like operating systems typically provide an asynchronous signal to be sent upon the death of another process. However, the asynchronous signal is only sent to a parent process for the death of an associated child process. The parent-child process relationship is created when the parent “forks” or creates a child process. Since the parent-child relationship is required in UNIX and UNIX-like operating systems for reception of the death of a child signal, a single independent monitor for many processes without the parent-child relationship cannot rely on this signal. Also, the UNIX and UNIX-like facilities for process monitoring do not notify a child process of the death of its parent. Therefore, while a parent process may fork a child process and rely on the death of child signal to monitor the child process, this facility does not allow a child process to monitor the parent because there is no “death of parent signal.”
Therefore a need exists for a method and apparatus for improved process monitoring.