A primary challenge to building a reliable and secure computer system is managing a persistent state (PS) of the system, which includes all the executable files, configuration settings, and other data that govern how the system functions. Misconfigurations and other PS problems are among the primary causes of failures and security vulnerabilities across a variety of systems ranging from individual desktop machines to large-scale Internet services. PS problems, along with problems caused by failures in system elements such as hardware components and programming logic, can deleteriously affect the entire system.
The cost of not effectively managing a system's PS is high. For example, PS problems can reproduce themselves after a system reboot or an application restart. In addition, PS state drifts during run-time due to changes such as patches and application related updates. Currently there exists no effective way to close the loop on changes occurring on the system. In such a scenario, if known problem identification fails, and if a subsequent system reboot/application restart fails to remedy the PS problem, there may be no choice but to manually examine the system to identify a root cause PS.
Manual investigation of a system to identify the root cause PS is difficult and expensive due to the large number of potential problems. For example, a potential set of state that can impact an application having trouble is huge, and correspondingly a potential root cause list can include a complete set of state on the system. Furthermore, the situation may be potentially worse if consideration is made of every possible combination of set as well, in particular for the case where there is not a single PS root cause.