Computers and computer networks are often multi-component systems that are subject to a variety of hardware and software events such as errors or failures of different origins. In such systems, it is often desirable to be able to monitor or collect events that occur in order to be able to diagnose problems associated with systems. In particular, many computer network systems may be very complex and may include both hardware and software components for which events are to be monitored. As an example, the computer systems and network components may be critical to important operational functions of high reliability systems. These and other situations have resulted in the development of tools and techniques for monitoring events such as system or component failures. Two of such conventional mechanisms for monitoring events are the simple network management protocol (SNMP) and phone home event reporting systems.
The simple network management protocol (SNMP) is a software system and protocol that uses SNMP agent software operating on one or more computerized devices to identify certain computer and/or network related events and report these events to a network management server computer system operated by support technicians or other individuals for problem resolution. In particular, SNMP can report hardware and software events such as errors, problems, and pre-determined operational characteristics, etc. to the management server. The SNMP agent runs on each system component that is to be monitored in order to collect data regarding that system component. An SNMP server collects data provided by the agents about the computer systems, networks, software, etc. being monitored and can provide simple commands back to the SNMP agent in order to adjust operational characteristics of the system component. The data to be collected is identified and stored in a management information base (MIB).
Another system for reporting hardware and software events, is referred to as a “phone home” system. This type of system allows a computer system experiencing a problem to dial out over a telephone line to a support server in order to communicate event-related information regarding the problem. The support server can be monitored by support personnel in order to diagnose the problem reported using the phone home technique.