In the design of mainframe central processing units, it is highly desirable to provide powerful and reliable error detection and handling features, and this requirement has mandated the provision of various circuits, firmware and software to sense and resolve the diverse types of errors which may occur in operation.
Among the possible error conditions encountered in a mainframe CPU are those in which a basic processing unit (BPU) part of the CPU, while performing routine data manipulation such as calculating, simply reaches an incorrect result. It can be shown that employing built-in error detection in the circuitry of a BPU results in both a doubling of the types of chips required and a doubling of the number of chips required as well as the necessity of incorporating precharge circuit techniques. This effect not only significantly extends the design effort required to develop a BPU, but also increases the "real estate" or space occupied by the BPU and its support circuitry and consequently that of the CPU.
In the invention disclosed and claimed in U.S. Pat. No. 5,195,101 by Russell W. Guenthner et al (which is assigned to the same assignee as the present invention), this problem was solved, in a CPU incorporating a BPU which included an address and execution (AX) unit, a decimal numeric (DN) unit and a floating point (FP) unit and also incorporating a cache unit situated logically intermediate the BPU and system memory, by duplicating each of the AX, DN and FP chips (i.e., duplicating the BPU) and performing all BPU data manipulation operations redundantly. The outputs from the duplicate BPUs were placed on respective master (MRB) and slave (SRB) result busses which are coupled to the cache unit, and the results were compared in the cache unit. If the results were not identical in each byte of the result, the individual chip in the cache unit detecting the no-compare condition issued an error signal, and appropriate steps to remedy or otherwise respond to the error signal may be undertaken.
This was a very effective technique, but it did leave the CPU in a condition which was somewhat difficult to restart during error recovery, because the BPU would typically have requested a block of memory from the cache unit, and, because of the manner in which the result was stored (even if an error is sensed), the requested block was corrupted such that restart, if possible, would have to take place at a previous step in the halted program and/or require access to main memory to obtain an uncorrupted copy of the corrupted block (which, as an additional complication, may already have been properly altered-perhaps many times-before the fault took place). Nonetheless, certain important economies of logic circuitry drove the requirement to store the corrupted block in cache. Under these circumstances, those skilled in the art will appreciate that it would be very advantageous for a CPU to have available a copy of the requested data in the form immediately preceding the fault such that an attempted restart can take place at the same step at which the fault occurred.