The present invention relates generally to computer systems, and more specifically, to matrix and compression-based error detection for a computer system.
A computer system may include multiple copies of the same logic. These copies may have outputs that are supposed to track one another, and may be compared in order to determine whether there is an error in any of the logic. A Multiple Input Signature Register (MISR) may be used to monitor the logic outputs to determine the presence of errors. A mainframe computer may include chips with MISRs which are provided for various manufacturing tests to test for quality assurance of the hardware. Another use of MISRs is for error checking in processors where data is partitioned across one or more chips. U.S. Pat. No. 5,784,383 (Meaney), which is herein incorporated by reference in its entirety, illustrates a method which uses a MISR which permits detecting of errors across chip boundaries due to a hardware failure in control logic even though a processor's error checking code (ECC) is not bad. A MISR on each bus is used to collect a dynamic signature representing all the critical buses on each chip that need to be compared. The MISR state combines present and previous states of these buses, so for the testing in accordance with the method of U.S. Pat. No. 5,784,383 the MISR will be different if one or more bus controls break or are broken. Since an N-bit MISR shifts, comparing a single bit of the MISR each cycle guarantees detection within N cycles of a problem. The method of U.S. Pat. No. 5,784,383 for identifying errors includes accumulating bus signature information which is a function of current and previous values of an input bus structures to determine sync of buses.
Functional testing for verification of functional design of processors is partly done by comparing the actual to expected values of architected registers after simulation of test instruction streams consisting of a few to many instructions. This type of functional verification may be incomplete for cases where an instruction in a test instruction stream updates a register incorrectly due to a functional design problem, but the problem is not detected because the incorrect register value is overwritten by a subsequent instruction in the test instruction stream before being used as a source operand. The comparison of actual to expected results at the completion of the simulation of the entire test instruction stream does not detect this functional error since the incorrect interim register value is not observed and does not affect the final register value. Improvement of the functional verification of processors by having all interim register values of test instruction streams checked, while only comparing actual to expected values at the completion of the simulation of the test instruction stream, is described in U.S. Pat. No. 6,311,311 (Swaney et al.), which is herein incorporated by reference in its entirety.