Modern distributed or enterprise systems, such as enterprise information technology (IT) data centers and grid computing systems, are paradigms of distributed computing where computation and data are distributed across diverse computational and storage elements. These systems provide the compute and storage capabilities for enterprise workloads such as multi-tier applications, desktop applications, and technical computing jobs. System management within such enterprise IT systems involves tasks concerning performance management, configuration management, patch management, problem diagnosis, etc. As referred herein, and as understood in the art, information technology, or IT, encompasses all forms of technology, including but not limited to the design, development, installation, and implementation of hardware and software information or computing systems and software applications, used to create, store, exchange and utilize information in its various forms including but not limited to business data, conversations, still images, motion pictures and multimedia presentations technology and with the design, development, installation, and implementation of information systems and applications. IT distributed environments may be employed, for example, by Internet Service Providers (ISP), web merchants, and web search engines to provide IT applications and services to users.
Enterprise IT systems are being increasingly characterized by growing complexity, scale, and heterogeneity of infrastructure and applications. Further, these systems are highly dynamic and subject to frequent changes such as service plug-in/plug-out, workload variations, failures, configuration updates, and application migration. Such changes affect the runtime operation of the system, and the service contracts offered to customers. In reaction to these changes, infrastructure elements, applications, as well as system management components in these systems need to be adapted. For example, compute and storage resources may have to be re-allocated, applications may need to be restarted, and monitoring infrastructures may require re-configuration.
Current approaches used by system administrators to manage the aforementioned changes are manual and/or involve a combination of ad-hoc tools and scripts, and they typically require special expertise and detailed actions by the administrators. Consequently, the current approaches are not suitable for large distributed or enterprise systems because of the high human operational costs, broken closed-loop automation, and reduced agility that would be associated for such a large scaling. Accordingly, while the current approaches may work fine in small scale installations, they do not scale well to larger installations, such as typical modern IT systems and utility systems of tomorrow.