As personal computers and workstations become more and more powerful, makers of mainframe computers have undertaken to provide features which cannot readily be matched by these smaller machines in order to stay viable in the marketplace. One such feature may be broadly referred to as fault tolerance which means the ability to withstand and promptly recover from hardware faults without the loss of crucial information. The central processing units of mainframe computers typically have error detection circuitry, and sometimes error recovery circuitry, built in at numerous information transfer points in the logic to detect and characterize any fault which might occur.
The CPU(s) of a given mainframe computer comprises many registers logically interconnected to achieve the ability to execute the repertoire of instructions characteristic of the computer. In this environment, the achievement of genuinely fault tolerant operation, in which recovery from a detected fault can be instituted at a point in a program immediately preceding the faulting instruction/operation, requires that one or more recent copies of all the software visible register be maintained and constantly updated. This procedure is typically carried out by reiteratively sending copies of the registers (safestore information) to a special, dedicated memory or memory section. In some CPUs, the safestore information is sent via a result bus during periods when the result bus is not otherwise occupied in order to minimize the number of conductive leads required, an important consideration in the use of smaller and smaller and yet ever more complex integrated circuitry. Sometimes, two safestore memories are provided to receive and temporarily alternately store two recent, but one always more recent, copies of the software visible registers. When a fault occurs and analysis (performed, for example, by a service processor) determines that recovery is possible, the safestore information is used to reestablish the software visible registers in the CPU with the contents held recently before the fault occurred so that restart can be tried from the corresponding place in program execution.
A basic exposition of the storage and use of safestore frames is presented in U.S. Pat. No. 5,276,862, entitled SAFESTORE FRAME IMPLEMENTATION IN A CENTRAL PROCESSOR, by Lowell D. McCulley et al, assigned to the same assignee as the present invention and incorporated by reference herein.
Those skilled in the art are aware of certain drawbacks to the usual provision of safestore capability, which drawbacks directly adversely affect CPU performance. Thus, as higher levels of CPU performance are sought, the performance penalty resulting from the incorporation of safestore techniques to enhance fault tolerance must be more closely considered. The technique discussed above has several drawbacks that adversely affect CPU performance. First, even for the execution of simple instructions during which the safestore operation can be interleaved intermediate other processes which do not use the result bus so as to cause no extra cycle time, some of the registers to be safestored are typically half-word in length and cannot be stored packed into the dedicated memory. As a result, during both the storage process of the safestore information into the dedicated memory (especially, since this is an ongoing procedure) and the recovery of the safestore information (less important since necessary only on fault recovery or a process change), more clock cycles are required to perform each operation.
Additional drawbacks include: 1) The contents of the accumulator and supplementary accumulator registers in a coprocessor may transiently be different than the corresponding registers in the main execution unit, the latter being those conventionally sent to safestore. This requires that the latest copy of these (and perhaps other) registers must be loaded with the latest version as single word stores. 2) Performing the safestore function during the execution of some instructions inherently costs one or two extra cycles, thus making the duration of these instructions correspondingly longer. 3) When the cache is commanded to recover the contents of the safestore information into cache memory in anticipation of a fault recovery/climb, no other cache commands can be executed by the climb during this move time.
While these characteristics are not design errors, their performance penalty is an obstacle to attaining the desired CPU speed level necessary to maintain competitiveness in the market. To a significant extent, these characteristics and their corresponding limitations on performance have been addressed by the inventions described in U.S. Pat. No. 5,553,232, entitled AUTOMATED SAFESTORE STACK GENERATION AND RECOVERY IN A FAULT TOLERANT CENTRAL PROCESSOR, by John E. Wilhite et al and U.S. Pat. No. 5,557,737, also entitled AUTOMATED SAFESTORE STACK GENERATION AND RECOVERY IN A FAULT TOLERANT CENTRAL PROCESSOR, by John E. Wilhite et al, both assigned to the same assignee as the present invention and incorporated by reference herein.
However, in all the foregoing prior art systems, the safestore frame resides in a memory and more particularly either in a dedicated or partially dedicated memory or in a private or shared cache memory. Therefore, the speed of changing domains and recovering from errors which take advantage of the availability of a valid safestore frame is inherently limited because of the need to recover the safestore frame from a memory. The subject invention is directed to obviating this necessity under certain conditions and accordingly realizing a significant performance increase in a fault tolerant computer system.