1. Technical Field
The present invention relates to software rejuvenation, and more particularly to a system and method for accelerating software rejuvenation by communicating rejuvenation events.
2. Discussion of Related Art
Large industrial software systems require extensive monitoring and management to deliver expected performance and reliability. Some specific types of software failures, called soft failures, have been shown to leave the system in a degraded mode, where the system is still operational, but the available system capacity has been greatly reduced. Examples of soft bugs have been documented in several software studies. Soft failures can be caused by the evolution of the state of one or more software data structures during (possibly) prolonged execution. This evolution is called software aging. Software aging has been observed in widely used software. An approach for system capacity restoration for telecommunications systems has been developed that takes advantage of the cyclical nature of telecommunications traffic was proposed. Telecommunications operating companies understand the traffic patterns in their networks well, and therefore can plan to restore their smoothly degrading systems to full capacity in the same way they plan their other maintenance activities. Experience has been that soft bugs occur as a result of problems with synchronization mechanisms, e.g., semaphores; kernel structures, e.g. file table allocations; database management systems, e.g. database lock deadlocks; and other resource allocation mechanisms that are essential to the proper operation of large multi-layer distributed systems. Since some of these resources are designed with self-healing mechanisms, e.g. timeouts, some systems may recover from soft bugs after a period of time. For example, for a specific Java based e-commerce system, when the soft bug was revealed, users were complaining of very slow response time for periods exceeding one hour, after which the problem would clear by itself. Host based worm disruption system that throttles the rate of connections out of a host has been reported. An approach for virus detection based on the inspection of the binary representing the process and the execution of a pattern-matching algorithm against known virus signature has been reported.
When an e-commerce system is a victim of a worm attack, software rejuvenation must be quickly triggered to avoid extensive infection of the e-commerce server and its network neighborhood. Therefore, a need exists for a system and method for accelerating software rejuvenation by communicating rejuvenation events.