This application relates to signal transmission and data communication techniques and systems, and more particularly, to fault-tolerant communication channel structures for information systems such as digital electronic systems and techniques for implementing the same.
Information systems generally include multiple information devices that are connected through various communication channels so that information can be transmitted from one device to another. Each device may be a receiver which only receives information from one or more other linked devices, a transmitter which only sends information to one or more other linked devices, or a transceiver which can operate as both a receiver and a transmitter. In the communication terminology, such an information system is essentially a communication network of communication nodes that are interconnected by hard-wired or wireless communication channels or links, where each node is an information device.
For example, such an information system or a communication network may be a general-purpose digital computer system which may include one or more computer processors, certain memory units, and various other devices. The communication channels in such a system often include electronic buses each of which has a collection of conducting wires for transmitting information in form of electronic signals. Other forms of communication channels may also be used, such as a wireless radio-frequency link or an optical communication channel which transmits information through one or more optical carriers over an optic fiber link or a free-space optical link. Another example of an information system is a task-specific computer system such as a flight control system for spacecraft or aircraft, which may integrate two or more computer systems, one or more navigation systems, and other devices together to perform complex computations.
One desirable feature of these systems is the system reliability against one or more faults or failures of nodes and communication channels in the network. One way to achieve such reliability is to make the system “fault-tolerant” so that the system can continue to operate, in the presence of faults, to meet the system specification without failure of the entire system. Such a fault in a node or a communication channel may be caused by software, hardware, or a combination of both.
One conventional fault-tolerant system duplicates all operations in a particular system. For example, each node may be duplicated and the duplicated nodes are used to perform the identical operations. Hence, in one implementation, when one node fails, one or more other duplicated nodes can take over. A voting scheme may also be used to produce the output of a node based on outputs of the corresponding duplicates.
Nodes in a communication system may be linked in a number of ways. In one classification, different linking configurations may be divided as one-connected-graph systems or two or multiple-connected-graph systems. In a one-connected-graph system such as a string of nodes in a line configuration or certain tree configurations, a communication between two nodes can fail due to a single failure in a communication link or node. Hence, a single-point failure in the network can partition the system and isolate one node or a group of nodes from the rest of the system. In a two-connected-graph system, at least two separate communication links or nodes must fail to break the communication between two nodes to cause a partition. A ring with multiple nodes is one example of a two-connected-graph system.