1. Field of the Invention
This invention relates to computer system reliability and, more particularly, to the detection and correction of errors in packets transmitted within a computer system.
2. Description of the Related Art
Error codes are commonly used in electronic systems to detect and correct errors such as transmission errors or storage errors. For example, error codes may be used to detect and correct errors in information transmitted via a communication link within a computer system. Error codes may additionally be used to detect and correct errors associated with information stored in the memory or mass storage devices of computer systems. One common use of error codes is to detect and correct errors in information transmitted on a bus within a computer system. In such systems, error correction bits, or check bits, may be generated for data prior to its transfer or storage. The check bits may then be transmitted or stored with the data. When the data is received or retrieved, the check bits may be used to detect and/or correct errors within the data. The use of error codes within a computer system may increase the reliability of that system by detecting errors as soon as they occur. Similarly, the use of error codes may improve system availability by allowing the system to continue to function despite one or more failures.
Errors in transmitted or stored information may be caused by transient conditions such as cross talk or noise encountered within a system. Component failures are another common source of error in electrical systems. Faulty components may include faulty memory chips or faulty data paths provided between devices of a system. For example, faulty data paths may result from faulty pins, faulty data traces, or faulty wires.
Hamming codes are one example of commonly used error codes. The check bits in a Hamming code may each provide the parity for a unique subset of the bits to be protected. If an error occurs (i.e., one or more of the bits unintentionally change state), one or more of the check bits will also change state upon regeneration (assuming the error is within the class of errors covered by the code). By determining which of the regenerated check bits changed state, the location of the error may be determined. For example, if one bit changes state, this bit will cause one or more of the regenerated check bits to change state. Based on which of the check bits change state, the erroneous bit may be identified and the error may be corrected by inverting the erroneous bit.