The present invention relates generally to integrated circuits (ICs), such as Application Specific Integrated Circuits (ASICs), and more specifically to a High Availability (HA) enhancement of ASICs/ICs with control and/or configuration registers that are susceptible to random errors through the introduction of Masking Registers and background parity control logic.
Application Specific Integrated Circuits (ASICs) and Integrated Circuit (ICs) are susceptible to random errors that cause flip-flops to change state incorrectly. Random errors are temporary errors; they are errors that are eradicated when the IC is reset or when the effected register is written. They can occur, for instance, when the logical state of a transistor is changed when an IC is struck by alpha and other sub-atomic particles which can create an ion-induced logical state of a transistor. Random errors can be harmful to a system—particularly if they are in sensitive areas of the ASIC/IC where these types of errors could cause total system failure. A couple of these sensitive areas include Control flip-flops and Configuration flip-flops. These control and configuration flip-flops become even more susceptible to these types of errors as IC technology advances and the transistors geometries decrease.
State machines are a big concern for random errors since they can cause lock-ups or incorrect behavior where recovery can be difficult. Single bit detection or correction can be achieved through proper state encoding. The recommended HA improvement is to add a parity bit to create a Hamming 2 encoding and then generate an error on bad parity and any other unused state. On highly critical state machines, Hamming 3 encoding can be used to correct single bit errors.
Another area of concern is configuration and control flip-flops (bits) which tend to be written at power-up and then are not addressed for long periods of time during normal operation. Failures on single bits in these configuration and control flip-flops can result in subtle problems or totally bring a system down thus resulting in lower system availability. It is very unlikely that software would ever read the control registers once they are written; because software usually only reads locations that are updated by the ASIC circuitry.
Various techniques and/or algorithms are available and open to the public which can be used for detecting or correcting errors. However, there is no set technique established on how these algorithms are implemented. Some methods involve heavy usage of hardware in the form of parity checkers. Others tie up software resources by actively verifying register contents, which can become a problem if a heavily utilized system does not allow time for such checks.