As personal computers and workstations become more and more powerful, makers of mainframe computers have undertaken to provide features which cannot readily be matched by these smaller machines in order to stay viable in the marketplace. One such feature may be broadly referred to as fault tolerance which means the ability to withstand and promptly recover from hardware faults without the loss of crucial information. The central processing units of mainframe computers typically have error detection circuitry, and sometimes error recovery circuitry, built in at numerous information transfer points in the logic to detect and characterize any fault which might occur.
The CPU(s) of a given mainframe computer comprises many registers logically interconnected to achieve the ability to execute the repertoire of instructions characteristic of the computer. In this environment, the achievement of genuinely fault tolerant operation, in which recovery from a detected fault can be instituted at a point in a program immediately preceding the faulting instruction/operation, requires that one or more recent copies of all the software visible register be maintained and constantly updated. This procedure is typically carried out by reiteratively sending copies of the registers (safestore information) to a special, dedicated memory or memory section. In some CPUs, the safestore information is sent via a result bus during periods when the result bus is not otherwise occupied in order to minimize the number of conductive leads required, an important consideration in the use of smaller and smaller and yet ever more complex integrated circuitry. Sometimes, two safestore memories are provided to receive and temporarily alternately store two recent, but one always more recent, copies of the software visible registers. When a fault occurs and analysis (performed, for example, by a service processor) determines that recovery is possible, the safestore information is used to reestablish the software visible registers in the CPU with the contents held recently before the fault occurred so that restart can be tried from the corresponding place in program execution.
Those skilled in the art are aware of certain drawbacks to the usual provision of safestore capability, which drawbacks directly adversely affect CPU performance. Thus, as higher levels of CPU performance are sought, the performance penalty resulting from the incorporation of safestore techniques to enhance fault tolerance must be more closely considered. The technique discussed above has several drawbacks that adversely affect CPU performance. First, even for the execution of simple instructions during which the safestore operation can be interleaved intermediate other processes which do not use the result bus so as to cause no extra cycle time, some of the registers to be safestored are typically half-word in length and cannot be stored packed into the dedicated memory. As a result, during both the storage process of the safestore information into the dedicated memory (especially, since this is an ongoing procedure) and the recovery of the safestore information (less important since necessary only on fault recovery or a process change), more clock cycles are required to perform each operation.
Additional drawbacks include: 1) The contents of the accumulator and supplementary accumulator registers in a coprocessor may transiently be different than the corresponding registers in the main execution unit, the latter being those conventionally sent to safestore. This requires that the latest copy of these (and perhaps other) registers must be loaded with the latest version as single word stores. 2) Performing the safestore function during the execution of some instructions inherently costs one or two extra cycles, thus making the duration of these instructions correspondingly longer. 3) When the cache is commanded to recover the contents of the safestore information into cache memory in anticipation of a fault recovery/climb, no other cache commands can be executed by the climb during this move time.
While these characteristics are not design errors, their performance penalty is an obstacle to attaining the desired CPU speed level necessary to maintain competitiveness in the market. The subject invention is directed to the alleviation of certain of the limitations mentioned.