One or more aspects of the present invention relate in general to data processing systems, and in particular, to checking correctness of computations of an arithmetic logic unit.
With the advances in circuit miniaturization and voltage reduction to save power, the probability of hard or soft errors during the lifetime of a circuit is rapidly increasing. This is disadvantageous for mission critical workload and becomes an issue for end users seeing their applications failing. Therefore, in arithmetic units for mission critical workload, some form of error detection in operations, such as addition, subtraction, multiplication, division, square root and convert operations, is employed. State of the art solutions to this problem often include the duplication of the operation (physically or timewise) with a comparison of both results, or using some form of residue checking Residue checking is performed within a checking flow by performing the same operations on the residue as those performed on the operands of the arithmetic unit. That is, a checking flow is performed in parallel to a data flow within the unit.
Power consumption of microprocessors on the other hand is an important concern. Arithmetic units consume a notable amount of power in the microprocessors. Therefore, power-saving techniques are employed to reduce the amount of power consumed by the arithmetic units within the microprocessors. Several problems occur in the conventional residue checking apparatus when power saving techniques are employed. For example, if a single check is performed, a conventional residue checking apparatus may be inoperable in a power saving mode because it's clocks have been temporarily disabled. The single check also needs to be disabled completely in the case of timing problems of the checking circuitry. In addition, a single point of failure may not be detected. Finally, the conventional residue checking apparatus may not be usable for complex operations within a multi-cycle pass such as divide, square root, and extended precision operations.
U.S. Pat. No. 8,566,383 B2, which is hereby incorporated herein by reference in its entirety, discloses a distributed residue checking apparatus for a floating point unit having a plurality of functional elements performing floating-point operations on a plurality of operands. The distributed residue checking apparatus includes a plurality of residue generators which generate residue values for the operands and the functional elements, and a plurality of residue checking units distributed throughout the floating point unit. Each residue checking unit receives a first residue value and a second residue value from respective residue generators and compares the first residue value to the second residue value to determine whether an error has occurred in a floating-point operation performed by a respective functional element.
U.S. Pat. No. 8,566,383 B2 further discloses a method of distributed residue checking of a floating point unit having a plurality of functional elements performing floating-point operations on a plurality of operands. The method includes generating residue values for the operands and the functional elements via a plurality of residue generators, distributing a plurality of residue checking units through the floating point unit, and receiving and comparing, via each residue checking unit, a first residue value and a second residue value from respective residue generators to determine whether an error has occurred in a floating-point operation performed by a respective functional element.