In many network contexts, it is important to maintain reliable, responsive communications with end users. In the case of an Internet Service Provider (ISP), connected clients may notice degraded service due to network device failures within the ISP network. In the case of a data center, an application may execute on one or more servers, and clients communicating with the application may notice degraded service due to network device failures within the data center. Traditionally, network operators or engineers have used device-generated events to debug and resolve network device failures.
However, often the number of device-generated events is too large for a human operator or engineer to practically analyze manually. Moreover, the information provided in the device-generated events does not necessarily lead directly to conclusions about how much impact the events have on end users. To address this issue, some automated techniques have evolved to help network operators and engineers recognize and respond to network failures more quickly. Unfortunately, existing techniques may not adequately prioritize failure events and this makes it more difficult for engineers and operators to quickly remedy network failures, particularly high-severity network failures which impact service performance, availability or security.