Complex applications such as managing cloud services, supervising a fulfillment center, managing a grid, advancing science, or treating patients, and the like, all may require applications to manage a significant amount of data well-structured processes. As an example, conformance to service level agreements (SLA) is a critical requirement for many cloud operations. Such conformance may require continuous monitoring of key performance metrics and predictive diagnosis capability to detect impending SLA violations to enable the operations to circumvent the SLA violations or provide quicker resolution of the issues when violations occur. Such cloud operations may have to monitor, diagnose, and manage millions of hardware and software components of the data centers, networks, server machines, virtual machines, operating systems, databases, middleware, applications, etc., in private, public, and hybrid clouds of the operators and/or the customers.
Reactive fault detection and manual diagnosis techniques of traditional information technology (IT) operations may be too labor intensive, requiring extensive domain expertise, and may be too late in responsiveness, resulting in disproportionate responses involving restarts of large parts of the system instead of isolating and fixing the faulty components, and may be unable to scale properly for the cloud. Effective cloud system operations may require continuous measurement of important vital signs, time-series analytics, multivariate system state models, system response models, predictive anomaly detection, classification based on machine learning, automatic diagnosis and prognosis, decision support, and various control capability.