The present invention relates to the field of digital computers and specifically to a number of computers connected together to form a data processing system.
In data processing systems, control functions are typically implemented using a sequence or stream of instructions called a program where the instructions are sequentially executed to carry out desired data manipulations. In the execution of instructions, the computer utilizes many circuits providing data paths that implement the operation of the computer. These data paths include, for example, registers and memory for storing data, control information, and instructions. As each instruction is executed by the computer, the different circuit locations within the computer assume either logical 1 or logical 0 values as a function of the program execution. The inputs and/or outputs from these locations within the computer are generally referred to as the direct (non-redundant) inputs and/or outputs whenever those inputs or outputs have values which in some way are determined directly as a function of the execution of the program by the computer. The circuit locations within a computer are connected by direct (non-redundant) data paths and control paths which are utilized in support of the execution of the program.
At times it is desirable to have a second or other computers examine or control the logical states of locations within a first computer. One manner of performing such examination or control is to stop the operation of the first computer and allow the second computer to utilize the direct (non-redundant) data and control paths of the first computer. Such operation, however, terminates or slows down execution of the primary programs by the first computer while the second computer executes an alternate instruction stream. If a first computer has an error condition, then frequently that error condition will not only interfere with the execution of the first and primary program by the first computer, but will also interfere with attempt by the second computer to determine the source of the error if the second computer utilizes the same circuit connections and paths as the first computer.
In order to avoid the use of the same circuit connections and paths in a computer, some computers have been provided with redundant input and/or output connections and paths to locations within a computer and have provided redundant accessing means for accessing locations through the redundant input and/or output connections and paths. Such a redundant system is described in U.S. Pat. No. 4,244,019 entitled "DATA PROCESSING SYSTEM INCLUDING A PROGRAM-EXECUTING SECONDARY SYSTEM CONTROLLING A PROGRAM-EXECUTING PRIMARY SYSTEM".
Prior art systems such as the one in U.S. Pat. No. 4,244,019 have provided methods and apparatus for controlling the operation of one computer by another computer in a data processing system in response to human intervention for maintenance or other reasons. Such methods and apparatus have not provided the flexibility and efficiency which is desirable for more efficient data processing systems.
In U.S. Pat. No. 4,244,019 the data processing system is formed with a first (primary) computer and with a second (secondary) computer. The first computer has instruction execution and processing apparatus operable to execute a first program formed as a first instruction stream. The second computer has instruction execution and processing apparatus for executing a second program formed as a second instruction stream. The second computer is typically a console which is capable, through its own program, to cause the execution of instructions and commands in the first data processing system and to cause the accessing of locations in the first data processing system using redundant connections and paths.
The primary function of the overall computer system in U.S. Pat. No. 4,244,019 is to execute the first (primary) programs in the first computer and the function of the second computer is to assist, control and interrogate the first computer.
Computers from time to time experience errors. In a system of connected computers, such as in U.S. Pat. No. 4,244,019, errors in one computer can cause delays in the operation of the other computers in the system thereby interfering with the primary function of the overall computer system. When a computer experiences an error, the typical operation is to suspend execution of the instruction stream for that computer so as to not cause any further errors. In a system, the suspension of operation of a second computer may slow down or stop the operation of a first computer, even when no errors exist in the first computer. When computers have been stopped, it is important to diagnose the problem and restart the computers for error free operation as quickly as possible. The diagnosis usually involves the scan-out and analysis of the state of many circuit locations in the stopped computer. The restarting of the stopped computer commences typically by an initial program loading (IPL) routine.
IPL routines for computers are well know. Typically, an IPL routine initiates the starting and distribution of clock signals, the resetting and clearing of many locations throughout a computer to establish initial conditions, and the downloading of control or other information and programs from disk or other memory. Each IPL routine is tailored to the particular computer which is being started or restarted.
The manual initiation of diagnosis and initial program loading for the restarting of a computer is often inefficient and interferes with or slows down the execution of primary programs in a primary computer. The inefficiency is particularly aggravated in a hierarchical system in which a second computer is controlling a first computer. If the second computer experiences errors, when the first computer is not, the second computers errors frequently interfere with the operation of the first computer thereby degrading the primary function of the overall system, that is, execution of programs by the primary computer.
In accordance with the above background, there is a need for an improved hierarchy of computers whereby operations of each of the computers leads to a more efficient overall system.