Interconnects may be point-to-point interconnects connecting two components together (for example, two components on the same circuit board or two components coupled to two different boards). Interconnects may be bi-directional in that they can be used to transmit signals in an outgoing and an incoming direction, for example. The interconnect width may be scalable from one bit (that is, a serial interconnect) to multiple bits in parallel. Many different types of components may be connected using interconnects, such as processors, memory bridges, input/output (I/O) hubs, etc. Interconnects may be any type of bus, such as an I/O bus. Interconnects may also be referred to as “links”. Interconnects in use today are typically required to have all channels thereof operating correctly at all times. Any single channel (or component) failure will typically cause the entire interconnect (such as a bus) to be non-operational before any reconfiguration of the link (interconnect) may be performed.
Generic interconnects connecting two or more components typically include logic (such as bus logic in the case of a bus interconnect) and one or more communication channels in each direction used to carry control information and data. The logic exists on each component of the interconnect between a data link layer and a physical layer. The communication channels are the minimum building blocks for the physical layer. Typical designs for interconnects such as buses provide no redundancy and require all communication channels to be operational. Any single channel failure results in a complete failure of the entire interconnect.
Mechanisms have previously been developed to improve I/O bus reliability and availability. These mechanisms can be divided into two categories, those mechanisms that resolve intermittent failures and those that use redundancy to resolve failures.
An example of an intermittent failure that may be resolved using some mechanisms is where an alpha particle hits the bus while data transfer is occurring and corrupts the transmitted data. A previously implemented mechanism used to resolve intermittent failures such as this is referred to as a “Detect and Retry” mechanism. Typical error detection schemes that may be used as part of such a “Detect and Retry” mechanism include parity, ECC (error correction code), and CRC (cyclic redundancy check) schemes. When error detection schemes are coupled with retry, the I/O bus can ensure correct data transmission. A variation of a “Detect and Retry” mechanism is to include a reset after the error detection but before retrying the transmission. Such a modified mechanism can be particularly useful for point-to-point interconnects that use high-speed serial transceivers.
“Detect and Retry” mechanisms cannot handle hard failures such as broken wire or inoperable transceivers. Providing redundancy is the most common scheme used to resolve such hard failures. In response to such a hard failure, redundant I/O buses may be used to transport data, or alternatively, failover channels may be used when one or more channels have failed.