1. Field of the Invention
The present invention is related to handling of event notification overflow or potential overflow conditions in computer systems, and more specifically to handling of duplicate events so that additional queue space is not required.
2. Description of Related Art
In large-scale distributed computer systems, such as those using distributed software models to perform tasks, multiple nodes provide independent execution of sub-tasks. In order to keep such a system operational, and further, to provide indication of events occurring at one node that either require a reaction from another node or indicate to the other node that either an erroneous operating condition has occurred, or that a phase of processing is complete. In particular, event notification and event logging are operations used to indicate system health to system administrators or software applications, including operating systems components.
Health monitoring techniques employed in distributed processing systems perform an important function in that connections to other nodes must be reliable and all of the active nodes that have been assigned tasks need to perform those tasks in order to ensure that the totality of the processing requirements are met, and in a timely fashion. The health of a node-based distributed processing system is typically monitored by: 1) a heartbeat messaging system, which passes messages between the nodes and a central monitoring component; and 2) an event notification system that signals interested nodes when events occur on other nodes. Event notification systems in node-based distributed processing systems typically require an interested application (a consumer) to register to receive event notifications either with a centralized event manager, or with the processes or objects that generate the events (an event producer).
Events in such a system may be reported multiple times. For example, an event may be reported for each interested event consumer. With the large number of events that may be generated, in particular where the event itself is triggered multiple times due to a resource change or a hardware or media failure, a large number of duplicate events may be buffered at a node, causing event queue overflow and/or consuming processing bandwidth that could be used to handle other events. While the duplicate events could simply be removed from the queue, information about how many events have occurred and the timing of the events may be lost.