1. Field of the Invention
The present invention relates to the management of distributed computer systems, and in particular, to the management of distributed systems which provide remote operation facilities for computer hardware resources.
2. Background and Prior Art
The increasing power and complexity of large computer systems, frequently termed "mainframe computer systems", has resulted in an increase in the complexity of computer system operations. The drive for increased workforce productivity, however, has tended to reduce the number of personnel assigned to the operations task. The proliferation of large computer system complexes, such as those used by airline reservations systems, banking centers, and similar computer intensive businesses, has also contributed to the need for more effective facilities for the control of hardware systems resources.
Large computer systems have traditionally been operated from an attached console accessible to the computer operators in a computer room. Each computer system has a dedicated console. Thus, in a large complex of, for example, six computers, six operator consoles require monitoring. Each of these computer consoles displays messages in the order generated by the computer system. Many of the messages are informational, indicating the status of certain operations on the computer systems. Other messages provide warnings of current or impending problems. Finally, a third class of message requires operator response to a request for action, such as mounting a tape, or to correct an error detected in the system. It becomes increasingly difficult for an operator to monitor several consoles with several different types of messages and be able to effectively respond to each one.
One solution to the increasing flow of messages is to develop an automated operations facility that is able to interpret and classify messages. These types of facilities can be constructed to segregate messages by message type and to present the operator with those requiring attention. Automated operations facilities of this type are typically constructed using a workstation computer that can be connected directly to the large computer system. The workstation computer contains the necessary programs for receiving, analyzing, and responding to certain messages.
Productivity gains are frequently achieved by centralizing operator resources in a single operations area. This area may be on a different floor or in a different building than the large computers themselves. Centralization requires that remote access and control of the hardware resource be provided. However, remote access creates a series of problems.
The first problem is the need the develop a system which will allow access to the hardware resource independent of the location of that resource. Next, the system must be designed in a way that allows recovery from the failure of any component in the control system. In other words, control system component failure must not cause the failure of control of the larger system. Finally, the control system must be flexible allowing the addition of controllable resources and individual control points without disrupting the ongoing control activities.
The problem of remote operations and management has been addressed in several ways. In U.S. patent application Ser. No. 07/577,967, filed Sep. 4, 1990, commonly assigned, an automated operations system is described which involves a controller coupled to the processor with remote workstation access for controlling that processor. This configuration provides control but limits remote access and fails to address the problem of control system redundancy and reconfiguration.
U.S. Pat. No. 5,005,122 suggests the use of a client server model for network management tasks. This system provides for management of a local area network (LAN) through the designation of a single network management node which directs other nodes to perform backup, software distribution, or other network management tasks. While this system provides a means for managing a network, there is no recognition or teaching of the management of a large mainframe hardware resource. In particular, there is no recognition of the requirement to establish fault tolerant connection facilities between a console client and the hardware resource.
Thus, there remains a technical problem of creating a system for remotely controlling a resource such as a large computer system in a manner that allows remote access, failure recovery, and configuration flexibility. In particular, the system must have a means for establishing the location of the resource to be controlled and for creating a link between a control console and that resource. In addition, the system must be able to recognize and recover from the failure of any control system component. Finally, a system is required which allows dynamic configuration changes to that control system.