In diverse technical areas that rely on reliable communication of signals, such as telephony and data transmission and switching, data processing, and process control, it is common to duplicate--or even more extensively replicate--system components (e.g., control units, circuit packs) in order to achieve fault tolerance, and hence reliability.
The replicated components typically operated either in active mode (all components are simultaneously operating in the same state and using the same inputs), or in "hot" standby mode (all components are powered up, but are not necessarily in the same state nor using the same inputs), or in "cold" standby mode (the non-active components need not be powered up).
When using standby components, some form of testing of the active component, or error detection in the data stream(s) processed by the active component, is typically used to determine when a switch of system output (a "protection switch") should be made from the active component to a standby component. Irrespective of whether the standby component is hot or cold, however, the switching action conventionally results in a time period during which data is corrupted.
Alternatively, having the replicated components operate in synchronized active mode can prevent data corruption if three or more components are used (e.g., by "voting" to determine the system output). However, having such redundancy has other problems. These problems include the cost of the extra component(s), increased probability of internal failure (because there is more equipment to fail) and the associated increased maintenance cost, and the extra space and wiring required to accommodate the extra component(s). Therefore, it would be advantageous to have an arrangement which would use only two replicated active components, but which would retain the ability to prevent data corruption.
Additionally, arrangements such as voting, which operate on the possible output signals themselves in order to determine which one should become the system output, introduce the possibility that the arrangements themselves will corrupt the output data which they are intended to safeguard.
Digitized voice is relatively tolerant of data corruption. And for low-speed data, if the time during which data is corrupted as a result of protection switching could be made less than a bit time, either error correction schemes or error detection combined with minimal retransmission could be used effectively to prevent corruption. However, for high-speed data, protection switching causes burst errors which make correction schemes impractical and detection schemes less reliable. Further, these burst errors may last long enough to corrupt the data of more than one user. If a burst error is not detected, myriad problems arise. Even when a burst error is detected, retransmission is needed, and it typically must be invoked either manually or by higher layers of data protocol. Thus, with a grade of service that allows error bursts caused by protection switching, upgrading of the equipment to operate with the protocol options that automate retransmission would normally be required. This may be very costly for high-speed data systems. Also, retransmission following a protection switch may cause temporary overload conditions. For these reasons, the prevention of data corruption rather than the mere curing of corrupted data is more desirable for high-speed data switching communication services.