During the past seven decades, electronic computing has evolved from primitive, vacuum-tube-based computer systems, initially developed during the 1940s, to modern electronic computing systems in which large numbers of multi-processor servers, work stations, and other individual computing systems are networked together with large-capacity data-storage devices and other electronic devices to produce geographically distributed computing systems with hundreds of thousands, millions, or more components that provide enormous computational bandwidths and data-storage capacities. These large, distributed computing systems are made possible by advances in computer networking, distributed operating systems and applications, data-storage appliances, computer hardware, and software technologies. Despite all of these advances, however, the rapid increase in the size and complexity of computing systems has been accompanied by numerous scaling issues and technical challenges, including technical challenges associated with communications overheads encountered in parallelizing computational tasks among multiple processors, component failures, and distributed-system management. As new distributed-computing technologies are developed and as general hardware and software technologies continue to advance, the current trend towards ever-larger and more complex distributed computing systems appears likely to continue well into the future.
In modern computing systems, individual computers, subsystems, and components generally output large volumes of status, informational, and error messages that are collectively referred to, in the current document, as “event messages.” In large, distributed computing systems, terabytes of event messages may be generated each day. The event messages are often collected into event logs stored as files in data-storage appliances and are often analyzed both in real time, as they are generated and received, as well as retrospectively, after the event messages have been initially processed and stored in event logs. Event messages may contain information that can be used to detect serious failures and operational deficiencies prior to the accumulation of a sufficient number of failures and system-degrading events that lead to data loss and significant down time. The information contained in event messages may also be used to detect and ameliorate various types of security breaches and issues, to intelligently manage and maintain distributed computing systems, and to diagnose many different classes of operational problems, hardware-design deficiencies, and software-design deficiencies.
In many systems, alerts are generated when certain types of event messages are received by monitoring-and-management systems. The alerts are distributed to personnel responsible for monitoring, managing, and administering the systems, so that failures and system-degrading events are quickly evaluated and addressed. The alerts may be received and displayed on personal computers, laptops, and works stations, but may also to received and displayed on smart phone, tablets, and other types of devices. Alerts may also be distributed to pagers, telephones, and other devices that receive alerts and notify personnel who own and/or use the devices. Although alert distribution is an effective method for quickly notifying personnel and marshalling needed personnel to address system failures and system-degrading events, currently available alert generation and distribution systems lack flexibility and responsiveness to user feedback. For example, it may turn out that a particular high-priority alert is often spuriously generated. In such cases, it would be beneficial for the high-priority alert to be downgraded in priority or disabled altogether, to avoid unnecessary diversion of personnel to respond to spurious alerts. However, in currently available systems, such changes often require high-latency report submission, authorization, and reprogramming or reconfiguration, as a result of which the spurious alert may continue to be generated for days or weeks before the alert can be disabled or reprogrammed. Users of alert-generating systems and subsystems continue to seek more flexibly and easily modified and managed alert systems.