A primary challenge to building a reliable and secure computer system is managing a persistent state (PS) of the system, which includes all the executable files, configuration settings, and other data that govern how the system functions. Misconfigurations and other PS problems are among the primary causes of failures and security vulnerabilities across a variety of systems ranging from individual desktop machines to large-scale Internet services. PS problems, along with problems caused by failures in system elements such as hardware components and programming logic, can deleteriously affect the entire system.
The cost of not effectively managing a system's PS is high. For example, PS problems can reproduce themselves after a system reboot or an application restart. In such a scenario, if known problem identification fails, and if a subsequent system reboot/application restart fails to remedy the PS problem, there may be no choice but to manually examine the system to identify a root cause item in the PS.
Manual investigation of a system to identify a root cause item in the PS is difficult and expensive due to the large number of potential problems. For example, a potential set of items that can impact an application having trouble is huge, and correspondingly a potential root cause list can include a complete set of items on the system.