1. Technical Field of the Invention
The present invention relates to software error handling and, in particular, to a method and apparatus for detecting, isolating, analyzing, and recovering from telecommunications application processing errors occurring within telecommunications exchanges.
2. Description of Related Art
Telecommunications switching systems (exchanges) are designed to provide at least the functions necessary to make the simple communications connection between Subscriber A and Subscriber B (commonly referred to as "plain old telephone service" or POTS). Exchanges are now further being designed to provide subscribers with a variety of telecommunications facilities (services and features) in addition to just POTS Service. These facilities include, for example, the popular call waiting and three-party call features used by many subscribers everyday.
Telecommunications facilities are provided within exchanges through a combination of hardware and software components. In spite of the demands of the communications subscriber for what would appear to be perpetually available telephone service, hardware failures and/or software errors do sometimes occur within the exchange. Such failures and errors often result in a partial or complete failure of the telecommunications exchange and a termination of communications services. In addressing the issue of exchange failure, service providers have concentrated on the development and installation of fault tolerant exchange hardware. For example, it is now standard that redundant hardware components be utilized in the exchange.
Little emphasis, however, has been placed on addressing exchange failures caused by software errors. One reason for this is that the complex telecommunications facilities software applications running on exchange platforms are often times developed by different teams of programmers in distinct, interacting software sections. While each programming team provides for some type of software fault tolerance and error recovery for their section, the error handling and recovery programming developed by one team of programmers for one software section differs from and often does not coordinate with the programming developed by other teams of programmers for other software sections. Corresponding types of errors are thus likely to be handled differently or inappropriately by the various software sections with sometimes disastrous or inconsistent results.
It is vitally important that software errors be quickly detected and responded to in such a way that the errors do not propagate to other parts of the system. It is also important that the system recover from the errors as quickly as possible. Furthermore, it is important that any included error handling functionality in a software system provide a coordinated response to detected errors.