1. Field of the Invention
The present invention relates generally to networked computer systems. More particularly, the present invention relates to software and systems management in networked computer environments.
2. Description of the Related Art
Distributed environments such as clusters of computing systems, data centers, and grid systems involve managing a large number of resources and service components. Typical end-service provided to the users of such systems require composition of multiple resources and service components, which together deliver the end-service that is of interest to the users. Such composition of multiple components require careful configuration of the components and deployment of these components, such that they interface with each other in a compatible manner so that the composite service is deployed, initialized, handles the workload submitted by users, handles component level faults gracefully, and provides robust service while handling fluctuations in the workload.
Realizing such composition from components involve orchestration of a large number of heterogeneous resources and service components. Managing the various tasks manually tends to be tedious and error prone. The magnitude of the complexity increases when resources belong to multiple administrative domains. While grid-based systems can facilitate resource sharing across multiple administrative domains, the grid-based systems are much harder to manage from a system administration point of view. One reason for the harder management is that the current state-of-the-art in system management technology has not kept pace with the advances in middleware and grid technologies. Some progress has been made in managing single or cluster-based systems. Even for such systems, system administrators face a much higher level of complexity when they configure and deploy a new service on an existing infrastructure or manage the lifecycle of the existing service and software stack. The situation is much worse in a complex application environment; for example, an environment involving orchestration of a workflow formed by multiple business processes. In such a case, deployment and life cycle management solutions need to take an integrated view of the multiple tiers involved and current system management technologies do not provide the necessary means to build such solutions.
Traditional methods for configuration and deployment of software components rely heavily upon domain experts' knowledge about the component requirements, availability of middleware and the underlying infrastructure, and overall IT environment. Using this background knowledge, a system administrator is first required to configure the existing infrastructure and then customize the configuration and deployment steps for a new component, so the new component can be deployed successfully. In case of distributed components, such an approach can be time consuming, error prone, and non-scalable to large scale installations. Further, such an approach does not lend itself to automation as system administrators are key components in the deployment workflow.
The ability to configure and manage large installations of systems has been an active area of research within the information technology community. The Local Configuration (LCFG) is a currently used script based system that dynamically configures machines based upon configuration information stored in a central database (Anderson, P., “Towards a High-Level Machine Configuration System,” LISA, 1994). The information pertains to network, system and services that need to be configured on the system. Smart Framework for Object Groups (SmartFrog) is a known system for specifying the configuration, and deployment of distributed applications (Goldsack, P., Guijarro, J., Mecheneau, G., Murray, P., Toft, P., “SmartFrog: Configuration and Automatic Ignition of Distributed Applications,” HP OVUA 2003). SmartFrog provides a language to specify the configuration of applications and dependencies between them. SmartFrog provides an infrastructure for processing and deploying applications on the distributed systems. The GridWeaver project is exploring the notion of combining LCFG and SmartFrog technologies for configuration management of large systems. The Organization for the Advancement of Structured Information Standards (OASIS) effort is looking at managing distributed resources using Web services. The Grid Forum is attempting to standardize the specification language as part of the Configuration Description, Deployment, and Lifecycle Management (CDDLM) activity.
Some technologies provide means to monitor individual Java™ 2 Platform Enterprise Edition (J2EE) components that are participating in providing a service. The technology helps system administrators to monitor performance, identify failures, and check performance bottlenecks.