With the evolving popularity of the Internet, the reliability of applications which provide web-based services over the internet has become very important to companies that provide them. In particular, it has become very important to detect, track and resolve server failures for services provided to customers.
Information technology infrastructure library (ITIL) is a best practice approach used to track services that facilitate the delivery of quality information technology (IT) services. An ITIL compliant system may deal with server outages based on creating a sequence of records. First, upon receiving an alert of a server outage, a user or administrator of a system employing ITIL may manually create an incident record to document the outage. The user may then apply a series of attempts to quick-fix the outage, such as reboot the server. If the user cannot fix the server, the user creates a problem record from incident record. If the ‘problem’ with the server is eventually fixed, an error record is created from the problem record once the problem is solved.
Data used to create the incident, problem, error records and other records is typically entered by a user. Thus, the data is usually not completely accurate and generally not reflective of the actual outage of a server. Additionally, it is difficult to determine the cost of a server outage based on metrics indicating the number of servers that are down, especially when the metrics are based on data which is manually generated by one or more users which run a system employing ITIL.