Since early computer systems performed only simple tasks, it was unnecessary for them to include more than one input/output unit (I/O unit). For this reason, computer systems such as the EDVAC device of 1948 contained only a single I/O unit. The I/O unit was merely the mechanism used by the computer system to communicate with the outside world. Since these early computer systems required a relatively small amount of information, the I/O unit could be controlled by the central processing unit (CPU) itself. As time passed, the complex tasks performed by computer systems required access to more and more information. Direct control of the I/O unit by the CPU was no longer practical.
In 1959, Univac corporation introduced its LARC computer system. The LARC computer system included an input/output controller (IOC) which was itself a genuine computer. The IOC was used to handle the flow of information to and from the outside world, thereby reducing the work required of the main CPU. Because of the need to handle huge amounts of data in the "information age," modern day computer systems usually include several I/O controllers. These I/O controllers are called input/output processors (IOPs). Similar to the IOC of the LARC computer system, IOPs are computer systems unto themselves. Each IOP is responsible for handling the information flow between one or more external devices (i.e., "the outside world") and the computer system.
The wide use of IOPs has greatly improved the overall performance of today's computer systems. Computer systems now communicate with a variety of external devices without overly burdening the main CPU. Examples of devices which are connected via IOPs include: terminals, magnetic storage units, optical storage devices, programmable workstations, and other computer systems. To the computer system, each of these external devices represents a resource that can be used to perform a specified task. In many cases, the task to be performed is accomplished by the external devices without extensive intervention on the part of the CPU. This, of course, greatly reduces the work required of the CPU.
To further leverage the power of IOPs, individual computer systems share IOPs and the associated external devices. When these independent computer systems are brought together, they form what is called a clustered computer system. A clustered computer system is to be distinguished from the parallel processor computer systems of today. While today's parallel computing systems often contain several CPUs which run a single operating system, the clustered computer systems of the future are each self contained computer systems which run individual, possibly different, operating systems.
However, there are two significant impediments to implementing this clustered approach. The first is the difficulty of coordinating problem ownership (i.e., service and maintenance action) between the sharing computer systems. The second impediment involves propagating the current status of the various devices amongst the sharing computer systems. To simply allow any computer system to handle problem ownership would, at a minimum, lead to confusion; but most likely, it would result in redundant and/or conflicting service and maintenance efforts. Of equal importance is the need to communicate status information to each sharing computer system in a way that is fully informative and timely, yet does not provide them with redundant or conflicting information.
Conventional problem ownership and status propagation schemes have not been designed to handle this clustered approach. Therefore, their application to this configuration has several shortcomings. For example, in the IBM 370 environment, each device is responsible for its own problem ownership and status propagation. Of course, there is no difficulty determining problem ownership since each device is responsible for its own problems. However, status propagation is left to human intervention (i.e., computer technicians) which is, of course, extremely expensive and time consuming. Further, devices typically have different problem ownership schemes and different ways of communicating status information to the technician. Hence, the technicians must understand a plethora of different recovery schemes and user interfaces. While this may provide job security for computer technicians, it does little to handle the difficulties inherent in the clustered computer systems of the future. Human intervention is not contemplated nor is it preferred. To be successful in the marketplace, the computer systems of today, and the future, must be able to propagate status information in a way that does not rely on expensive human intervention.
Another conventional approach is embodied in the current IBM AS/400 midrange computer. The AS/400 problem ownership and status propagation scheme does provide for status propagation without the expensive human intervention of the 370 environment. Each of the devices attached to the IOPs of the AS/400 computer system reports status information directly to the responsible IOP which in turn reports the status information directly to the main CPU. However, the current AS/400 computer system involves only a single computer, and hence, it lacks the ability to coordinate problem ownership amongst several computer systems. If this scheme were applied to the clustered system environment of the future, redundant and/or conflicting service and maintenance efforts would most certainly result.