Information Technology Infrastructure Library (ITIL) and International Organization for Standardization (ISO) 20000 formalize a set of practices for IT Service Management (ITSM). ITSM includes several processes like incident management focused on restoring normal service operation as quickly as possible, problem management focused on finding the root cause of problems and thus preventing further incidents, change management focused on ensuring that standardized methods and procedures are used for efficient and prompt handling of all changes, and configuration management that tracks entities (i.e. configuration items like a computer, a laptop, a router, a server, an IT service, etc.) along with their properties and relationships in a configuration management database.
The above mentioned ITSM process areas track incident, problem, and changes where the incident is defined as an unplanned interruption to a service, a reduction in the quality of a service, or an event that has not yet impacted the service to a customer, problem is the root cause of related incidents, and change is a way to resolve a problem. Changes are typically done to resolve incidents and problems, but sometimes may be the cause of new incidents as well. ITSM systems also link incidents, problems, and changes to configuration items (CIs) to which they relate. Some ITSM systems may also explicitly cull out outages (i.e. the unavailability of a service) and link them to the service CI, and related incidents or problems, allowing for outage duration, severity and business impact to be tracked more easily.
Recording outages explicitly and linking them to related incidents and problems can help track root cause of Service Level Agreements (SLA) breaches. Correlating incidents to underlying problems, or to changes that may have led to the incidents, helps with categorizing incidents and understanding their root cause. For example, occurrence of multiple ‘incidents’ related to a common asset (i.e. CI) may suggest a ‘problem’ with that particular asset that requires a rectifying ‘change’ which could be an update or replacement of the asset. Repeated large number of such ‘incidents’ and ‘problems’ with an asset class may point to an underlying asset defect suggesting a bigger ‘change’ may be in order, like switching to a better performing equivalent asset from a competitor.
In existing ITSM systems, correlations are typically done manually and retrospectively. As a result, it is sometime cumbersome to resolve incidents sooner and to identify problematic incidents needing urgent attention. For example, incoming incidents that are related to an incident which has breached its SLA are likely to breach their SLA as well if the underlying common Problem has not been resolved, so identifying related incidents and problems as new Incidents is essential to identify problematic incidents that could breach a SLA.