The technical field is error correcting code for storage or communications systems.
Communication and storage systems are subject to errors that may affect operation of connected systems. A typical error may result when a particular memory location is exposed to one or more xcex1 particles. Such radiation may cause a data bit stored in the memory location to flip from a xe2x80x9c1xe2x80x9d to a xe2x80x9c0.xe2x80x9d
Error correcting codes (ECC) are used to enhance reliability and state integrity of communications and storage systems. Error correcting codes are known that will correct a single error, and will detect, but not correct, a double error. Other ECCs will detect and correct multiple errors. For ECC applications, memory array chips may be organized so that errors generated in a chip can be corrected by the ECC.
Correction of single bit errors and detection of double bit errors may be accomplished by use of check bits. A typical ECC implementation appends a number of check bits to each data word. The appended check bits are used by ECC logic circuits to detect errors within the data word. The simplest and most common form of error control is implemented through the use of parity bits. A single parity bit is appended to a data word and assigned to be a 0 or a 1, so as to make the number of 1""s in the data word even in the case of even parity codes, or odd in the case of odd parity codes.
Prior to transmission of the data word in a computer system, the value of the parity bit is computed at the source point of the data word and is appended to the data word. On receipt of the transmitted data word, logic at the destination point recalculates the parity bit and compares it to the received, previously appended parity bit. If the recalculated and received parity bits are not equal, a bit error has been detected. Use of parity codes has the disadvantage, however, of not being able to correct bit errors and not being able to detect even numbers of bit errors. For example, if a data bit changes from a 0 to a 1 and another data bit changes from a 1 to a 0 (a double bit error), the parity of the data word will not change and the error will be undetected.
By appending additional parity bits to the data word, each corresponding to a subset of data bits within the data word, the parity bit concept may be extended to provide detection of multiple bit errors, or to determine the location of single or multiple bit errors. Once a data bit error has been detected, logic circuits may be used to correct the erroneous bit, providing single error correction.
A well known error correction code is the Hamming code, which may be a SEC-DED code, for example. The ECC appends a series of check bits to the data word as it is stored in memory. Upon a read operation, the retrieved check bits are compared to recalculated check bits to detect and to locate (i.e., correct) a single bit error. By adding more check bits and appropriately overlapping the subsets of data bits represented by the check bits, other error correcting codes may provide for multiple error correction and detection.
Verifying the correctness of the error correcting code includes two steps: verifying the underlying algorithm of the error correcting code and verifying the implementation of the error correcting code on a hardware device or on a simulation of the hardware device. Current methods for verifying the error correcting code do not link these two steps, and hence do not provide a complete verification. An example of this problem may be shown with respect to linear codes. Linear codes are constructed using properties based on Galois field arithmetic. The proof of the properties in concept may be made within the mathematical framework of Galois fields. Based on this concept, a generator matrix (known as a G matrix), a parity matrix (known as an H matrix), and different syndrome vectors corresponding to various error scenarios are generated, either by hand or by a computer program. A single-error correcting, double-error detecting (SEC-DED) code would have an H matrix in which no two columns are identical and in which the Galois field addition of any two columns is not equal to any column in the H matrix. The mathematical proof of the concept does not detect any error introduced during the generation of the G and H matrices and the syndrome vectors. The G and H matrices and the syndrome vectors are then used in a high-level language to generate the error correcting code circuitry, which may be implemented as a hardware device or a simulation of the hardware device. Verification of the implementation is completed by checking whether the implementation provides expected outputs based on the G and H matrices and the syndrome vectors.
One problem with this conventional approach comes from errors that may occur during generation of the G and H matrices and the syndrome vectors. Such errors may go undetected because no automated tool exists to directly produce the error correcting code circuitry from the mathematical properties.
A method and an apparatus verifies the correctness of the error correcting code algorithm and the correctness of the error correcting code implementation. An error injection module is used to inject random errors into an ECC circuit between an encoder and a decoder. The encoder encodes data bits with check bits to produce an encoded signal. A decoder decodes the encoded signal, after modification by the error injection module. The error injection module may inject zero errors. Alternatively, the error injection circuit may inject a single error or multiple errors. The output of the decoder may be a zero error signal, a single error signal, a multiple error signal, and an error location signal. Other signals are also possible. The output of the decoder is compared to expected values for each signal using a monitoring module. Any differences between the output signals and the expected values may indicate an error in the ECC or in the circuit used to implement the ECC.
The ECC may be verified by implementing the verification apparatus in an actual hardware device. In this embodiment, the error injection module and the monitoring module may be located on a same chip as the decoder and the encoder. Alternatively, the error injection module and the monitoring module may be located on chips separate from the decoder and the encoder. The ECC verification apparatus may also be implemented as a simulation of the actual hardware device or in a formal verification model of the actual hardware.