1. Technical Field
The present invention is generally directed to an improved computing system. More specifically, the present invention is directed to an apparatus and method for monitoring the health of systems management software components in an enterprise.
2. Description of Related Art
The management of heterogeneous distributed computer systems is a complex task that can involve various operating systems, distributed network services and system management tasks. International Business Machines, Inc. has created a system for centralized control of a distributed environment, which can include mainframes, UNIX or NT workstations, personal computers, and the like. This system is known as the Tivoli Management Environment of which, the Tivoli Management Framework is the base component on which Tivoli applications are built for management of distributed computing systems. Information about the Tivoli Management Environment and Tivoli Management Framework can be obtained from the Tivoli web site at http://www.tivoli.com/support/public/Prodman/public_manua ls/td/ManagementFramework3.7.1.html, for example.
The Tivoli Management Environment (TME) framework provides the foundation for managing resources in a distributed environment. The TME framework provides a set of system management services that enable a user to install both the framework and selected applications on multiple heterogeneous systems. Once installed and configured, the framework provides a robust foundation for managing TME resources, policies and policy regions.
A resource, or managed resource, as the term is used in the present application, is any hardware or software entity (machine, service, system or facility) that is represented by a database object. Managed resources are subject to a set of rules and must be a supported resource type in a policy region. Managed resources include, but are not limited to, managed nodes, task libraries (a container in which an administrator may create and store tasks and jobs), profiles (a container for application-specific information about a particular type of resource), profile managers (a container that holds profiles and that links a profile to a set of resources, called “subscribers”), monitors (a program that resides in the endpoint (workstation which has the Tivoli Management Agent program running in it), and performs the task of monitoring a resource/program—e.g. disk space, process, memory etc.), bulletin boards (a mechanism to which notices may be posted so that the framework and applications may communicate with the human administrator), workstations, software, and the like.
A policy is a set of rules that is applied to managed resources. A specific rule in a policy is referred to as a policy method. An example of a policy is that all user accounts must have passwords, and password aging must be enabled. These rules may take the form of software, shell scripts, written procedures and guidelines, and the like.
A policy region is a group of managed resources that share one or more common policies. Policy regions are used to model the management and organizational structure of a network computing environment. The policy region contains resource types and a list of resources to be managed.
The TME framework, in its most basic sense, is comprised of one or more Tivoli Management Region (TMR) servers and one or more managed nodes. A TMR server is a server that holds or references a complete set of software, including the full object database, for a Tivoli management region. A Tivoli management region is defined as a Tivoli management region server and its associated managed nodes. The TMR server includes the libraries, binaries, data files, and graphical user interfaces needed to install and manage a TME. The TMR server maintains the TMR server database and coordinates all communications with TME managed nodes. The TMR server also performs all authentication and verification necessary to ensure the security of TME data.
A TME managed node runs the same software that runs on a TMR server. Managed nodes maintain their own databases, which can be accessed by the TMR server. When managed nodes communicate directly with other managed nodes, they perform the same communication and/or security operations performed by the TMR server. The primary difference between a TMR server and a managed node is the size of the database maintained.
One configuration of a TME framework requires a two-tiered approach: TMR servers communicating with managed nodes or personal computer managed nodes. FIG. 1A illustrates such a configuration. As shown in FIG. 1A, a single TMR server 110 manages the resources of managed nodes 120–140 which also manage their own resources. Thus, the TMR server 110 will maintain a database relating to each of the managed nodes 120–140, and the managed nodes 120–140 will maintain a database relating to their own respective resources.
With such a configuration, operations on each client device, or endpoint, of each managed node 120–140 required a call to the TMR server 110 to update information on the server database. For a large installation, this communication load is substantial. Additionally, operating system imposed limits on the number of clients a system can communication with at one time limits the size of a Tivoli Managed Region (TMR) to no more than approximately 200 clients.
In another configuration, as shown in FIG. 1B, a three-tiered approach is taken. In this configuration, a TMR server 150 is coupled to gateways 160 and 170, and a managed node 180. With the reduced number of managed nodes in the TMR, the amount of communication with the TMR server is significantly reduced. Endpoints 175, or clients, do not communicate with the TMR server 150, except during the initial login process. All endpoint 175 communications go through the gateway 170. In most cases, the gateway 170 will provide all of the support an endpoint needs without requiring communication with the TMR server 150. In a smaller workgroup-size installation, the gateway 170 may be created on the TMR server 150.
The TME framework provides the ability to subdivide an enterprise network into multiple TMRs, and then to connect them with either one or two-way connections. Installations composed of managed nodes and personal computer managed nodes often require multiple TMRs for a variety of reasons. Installations using endpoints and endpoint gateways rarely need more than one TMR.
While the Tivoli Management Environment (TME) monitors many aspects of system and network operations, it does not provide a mechanism to monitor itself. Thus, while the TME may be able to handle problems with various components of the systems and networks, errors or failures of the TME itself will not be identified and appropriate corrective action may not be performed until some other dependent component fails. Therefore, it would be beneficial to have an apparatus and method that monitors the health of systems management software components, such as components of the TME, in an enterprise.