In the past decades information technology (IT) systems have evolved and increased in complexity. In the past a company would use a single computer with a single operating system and small number of programs to supply the computational needs of the company. Nowadays enterprise companies may have hundreds and thousands of computers interconnected over a network. The company may use multiple servers and multiple databases to service hundreds and thousands of computers connected to them. Essentially each layer of the IT system has evolved and become more complex to control and manage. In some cases multiple servers may be installed with identical software and load balancers may be used to regulate access to the servers. An average business system includes tens or hundreds of thousands of configuration parameters. For example Windows OS contains between 1,500 to 2,500 configuration parameters. IBM WebSphere Application Server has about 16,000, and Oracle Weblogic more than 60,000. If any of these parameters are misconfigured or omitted the change may impact proper operations of the IT system.
The dependence of IT systems on the configuration can have serious consequences, for example in April 2011 Amazon Web Services suffered a devastating event that knocked offline some of their clients for as much as four days. It turned out that a network configuration error made during a network upgrade caused the problem. In the past upgrades were rare and applied slowly to the client servers. Nowadays especially with the help of the Internet upgrades for some software packages may be released on a daily basis and even automatically applied. If a problem arises in response to an upgrade most systems are incapable of presenting an administrator with a list of changes let alone suggest what changes are the most probable cause of the problem.
It is thus desirable to improve the ability to avoid problems in IT system updates and day-to-day operation and to reduce the mean time to resolution (MTTR) for handling problems that still occur in the IT systems. The prevention of problems and reduction of the MTTR can help to prevent economic damage to the organization.
A few companies have developed software products that help system administrators to keep track of changes to computer configurations. These products detect granular changes to configuration items (CI). Typically such products collect and store the configuration items (CI) in a configuration management database (CMDB) so that the current value of a configuration item may be compared to prior values or to similar machines. The products may also bundle configuration items into composite CI's to enable easier visualization of the CI's, for example by grouping them by their type or content. Once the configuration items are collected an IT user (e.g. engineer, system administrator) may need to analyze hundreds, thousands or millions of granular changes or groups of changes to detect the source of a problem.
Some software packages record periodic snapshots of a computer or group of computers in the IT system so that in case of an application failure the current state may be compared with prior states to locate changes that may be the cause of failure. This method can reduce the number of changes that need to be checked (e.g. from a few specific generations of change). However it provides little help in pinpointing the change that is the root cause of the failure.