1. Field of the Invention
The present invention relates to signal processing, and, in particular, to error protection for the routing of signals through telecommunications systems.
2. Description of the Related Art
A typical telecommunications system has one or more switches that route signals for transmission between pairs of end users of the system. Each switch is able simultaneously to receive incoming signals originating at a plurality of different end users and route those received incoming signals for transmission as outgoing signals to a plurality of different destinations (i.e., various end users). In general, a switch is able to route signals from any given end user to any one or more other given end user(s).
In order to maintain a high quality of communications service, it is very important for the switches of a telecommunications system to operate efficiently and reliably. In the past, telecommunications systems carried only telephony (i.e., voice) signals between telephone end users. Then, it was acceptable for a switch to fail to operate properly for short periods of time (e.g., up to 60 msec) without adversely affecting the quality of the service provided to the end users, since the human ear can tolerate gaps in telephony service of that duration. As long as the existence of a failure was detected and the switch to redundant protection hardware was made within 60 msec of the failure, telephony system requirements for fault tolerance would be met. Such a recovery from a switch failure is referred to as hit-less protection switching.
Today, however, telecommunications systems are being used to transmit data signals as well as telephony signals. In such applications, any failure of a switch to operate properly, even an intermittent failure lasting only a very short duration, may result in a loss of data that would be unacceptable to one or more of the end users of the telecommunications system. As such, it is desirable and often mandatory for telecommunications systems to provide fault tolerance with robustness against errors in which no data is lost as a result of (at least) any one single-point failure in a switch.
One way to provide such xe2x80x9cerror-less fault tolerancexe2x80x9d is to buffer enough data on the output side of the switch to provide enough time for fault tolerance processing to detect the occurrence of a failure and switch processing to redundant protection hardware as needed to resume accurate switching operations without losing any data. Unfortunately, as data transmission rates and switch throughput increase, the buffer size and transmission delay required to ensure error-less fault tolerance becomes cost prohibitive, and the increased buffering adversely increases the latency of the switch. Moreover, typical prior-art fault tolerance schemes will not detect or prevent random errors, such as spurious random bit or symbol errors, from corrupting the routed data.
The present invention is directed to a technique for providing fault tolerance in transmission equipment, such as a telecommunications switch. The present invention protects the integrity of signal routing operations from onexe2x80x94and, depending on the implementation, even more than onexe2x80x94failure within the equipment. Moreover, the present invention enables signal routing operations to proceed without any loss of data, and without requiring substantial buffering of data. Such error-less fault tolerance enables signal routing operations to proceed seamlessly in the event of either intermittent or permanent failures as well as during routine equipment maintenance.
The present invention can be applied to a data transmission system that meets the following two conditions. First, the system is distributed, meaning that transmission processing is spread over several elements such that different elements route different subsets of data, where one or more of the elements may have impaired functionality that needs to be protected. Examples of such elements include integrated circuits, electrical links, circuit packs, optical links, and electrical-to-optical conversion devices. Second, the system has excess capacity, meaning that certain elements are present in the system, but are not used when no impairment has occurred.
For such systems, errorless control coding for equipment protection can be implemented as follows. First, an encoder is added to each signal stream at the input to the equipment and a decoder is added to each signal stream at the output of the equipment. (Note that, depending on the implementation, there may be more than one encoder/decoder for each signal stream or more than one signal stream for each encoder/decoder.) Second, the path of the encoded stream through the equipment is analyzed and the consequence of the failure(s) of any element(s) upon the stream at the decoder input is tabulated. Third, suitable encoding/decoding algorithms are designed to complement the design of the distributed processing in the equipment such that the effects of elemental failures can be errorlessly corrected. Note that iteration between these last two items may be needed for their successful implementation. The choice of coding algorithm and the system design may affect other performance measures such as transmission delay and circuit complexity. In general, the present invention can be used to provide error correction for a single element failure and, depending on the particular coding scheme, possibly multiple (dependent or independent) element failures.
According to embodiments of the present invention directed to telecommunications switches, an error control coding (ECC) scheme, such as an appropriate symbolic coding scheme, is applied to the data at the input side of a switch to generate encoded data, which is then transmitted through the distributed switch fabric of the switch. The encoded data carries additional symbols requiring additional switch capacity. The encoded data is then analyzed at the output side of the switch fabric to determine whether any errors occurred during transmission through the switch fabric. In preferred embodiments, error correction is also applied to recover from a single failurexe2x80x94and, depending on the particular coding scheme, possibly multiple (dependent or independent) failuresxe2x80x94in the switch fabric. As such, the present invention is able to provide error-less fault tolerance, where no data is lost, even in the case of multiple failures within the switch fabric.
In one embodiment, the present invention is a method for transmitting data streams, comprising the steps of (a) encoding a k-symbol original dataword in an incoming data stream to generate an n-symbol codeword, wherein n is greater than k; (b) slicing each n-symbol codeword into a plurality of codeword slices: (c) routing the codeword slices through distributed transmission equipment to generate a plurality of routed codeword slices; (d) combining the plurality of routed codeword slices to generate an n-symbol routed codeword; and (e) decoding the n-symbol routed codeword to generate a k-symbol routed dataword of an outgoing data stream corresponding to the k-symbol original dataword in the incoming data stream.
In another embodiment, the present invention is an apparatus for transmitting data streams, comprising (a) one or more encoders configured to encode a k-symbol original dataword in an incoming data stream to generate an n-symbol codeword, wherein n is greater than k; (b) one or more slicers configured to slice each n-symbol codeword into a plurality of codeword slices; (c) distributed transmission equipment configured to route the codeword slices to generate a plurality of routed codeword slices; (d) one or more combiners configured to combine the plurality of routed codeword slices to generate an n-symbol routed codeword; and (e) one or more decoders configured to decode the n-symbol routed codeword to generate a k-symbol routed dataword of an outgoing data stream corresponding to the k-symbol original dataword in the incoming data stream.