Redundancy techniques such as duplication and Triple Modular Redundancy (TMR) are commonly used for designing dependable systems to ensure high reliability, availability and data integrity. TMR is an example of a redundancy scheme that is used for fault-masking. A good reference for TMR is Von Neumann, J., “Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components,” Automata Studies, Ann. of Math. Studies, no. 34, C. E. Shannon and J. McCarthy, Eds., Princeton University Press, pp. 43-98, 1956. In a TMR system, one uses three (same or different) implementations of the same logic function and the outputs of all the implementations are connected to a voter as shown in FIG. 1. There are numerous examples of dependable systems using the TMR technique as, for example, described by Siewiorek, D. P. and R. S. Swarz, Reliable Computer Systems: Design and Evaluation, Digital Press, 1992.
For voting on the outputs of the individual modules, majority voting circuits are generally used in TMR systems. FIG. 2 shows a design of a majority voting circuit. In FIG. 2, Z11, Z12 and Z13 are the outputs corresponding to the bit position Z1 of the three modules of the TMR system as shown in FIG. 1. The corresponding voted output bit of the system is Z1.
In TMR systems, majority voting is normally performed on a bit-by-bit basis. For a system with n outputs, conventional TMR systems use n single-bit voters. FIG. 3 shows the implementation of such a TMR system with two outputs Z1 and Z2.
The prior art teaches reliability modeling of TMR systems as, for example, in Trivedi, K. S., Probability and Statistics with Reliability, Queuing, and Computer Science Applications, Prentice Hall, Englewood Cliffs, N.J., USA, 1982. For the classical TMR system shown in FIG. 1, the reliability R is given by the following expression:R=Rm3+3Rm2(1−Rm)
In the above expression, Rm is the reliability of each individual module in the TMR system. The above expression follows from the fact that for the TMR system to produce correct outputs, at least two of the three modules must produce correct outputs.
The classical reliability expression for TMR systems is optimistic because it does not consider common-mode failures. Lala observed that one must pay attention to the problem of common-mode failures (CMFs) as published in Lala, J. H. and R. E. Harper, “Architectural Principles for Safety-critical Real-time Applications,” Proc. of the IEEE, Vol. 82, No. 1, pp. 25-40, January 1994. CMFs result from failures that affect more than one module of the redundant system at the same time, generally due to a common cause. They can be design faults or operational faults due to external (such as EMI and radiation) or internal causes. For example, a radiation source causing multiple-event upsets may lead to the failure of more than one module in a TMR system as taught, for example, by Reed in Reed, R., et al., “Heavy Ion and Proton-Induced Single Event Multiple Upset,” IEEE Trans. on Nuclear Science, Vol. 44, No. 6, pp. 2224-2229, July 1997. There is no built in facility in conventional TMR voters to detect this situation and initiate appropriate actions.
Accordingly, there is a need for new voter designs for TMR systems that are useful in the context of common-mode and multiple failures that affect multiple modules in a TMR system. More generally, there is a need for new voter designs for modular redundant systems.