Today, computing systems including network servers may experience failures or outages where the servers are unable to communicate with user devices. This may cause several problems, particularly in fields such as banking, e-commerce, health information systems, nuclear power, etc., where system availability may be crucial. To detect outages, system administrators may manually review application data, may receive complaints from customers after the system experiences an outage, and/or may create robot interactions with the system to simulate users.
However, in such systems millions or even billions of events may be logged each day, making manual review an extremely time-consuming, if not impossible task. Moreover, in many cases, the system may be unavailable for a lengthy period of time causing harm to the users before customer complaints are received. Robot interactions may also be time-consuming and difficult to create, which may result in errors or inaccuracies in system outage detection.