1. Field of the Invention
The present invention relates to computer buses, more particularly to a backplane bus in a multiprocessor computer.
2. State of the Art
In computer systems, a bus is commonly used to communicate between logical blocks or modules. The modules connect to a common communications medium, such as a set of wires, or printed circuit board traces. The rules that govern the access of modules to the bus and data transfer constitute the bus protocol. Generally, all modules on a bus must use the same protocol.
In a typical bus implementation, a set of traces are embedded in one or more printed circuit boards. Modules connect to the bus through bus transceivers. Modules connected to a bus may all reside on the same printed circuit board. Alternatively, modules may reside on separate printed circuit boards and be attached to an electro-mechanical structure that incorporates the physical bus medium through a series of connectors. The physical bus medium, together with the electro-mechanical structure that incorporates it, is called the backplane bus.
In a multiprocessing computer, multiple processor are provided, each of which performs a portion of an overall computational task. A Symmetric Multi-Processing (SMP) computer is one in which each processor has substantially equal access to system resources in general. Typically in an SMP computer, multiple processor boards, memory boards and I/O boards plug into a common backplane bus to realize a robust, reconfigurable computer system. Processor boards may have multi-level caches, for example a primary on-chip cache, a fast secondary (e.g., SRAM) cache, and a slower tertiary (e.g., DRAM) cache. A cache coherency model is used to update data in various levels of caches among the various processor boards to ensure that out-of-date data is not used.
Various standards have been developed which define the physical features and protocols of different backplane busses, including, for example, the Pyramid C-Bus, the Intel/Siemens/BiiN AP-Bus, and the IEEE FutureBus/FutureBus+. Generally, the signal lines on standard backplane buses can be partitioned into logical groupings that include a data transfer bus, which includes address and data lines; an arbitration bus, which includes control acquisition lines; and a utility bus, which includes power leads and, on some buses, clock signals, initialization and failure detection lines.
One measure of bus performance is aggregate throughput, i.e., on average, how much data can be transferred across the bus in a given period of time. Throughput is in rum a function of raw bus speed (how fast signals can be driven) and bus utilization (how busy the bus can be kept). Another consideration in assessing bus performance is reliability and fault tolerance. Faults are inevitable in digital computer systems due, at least in part, to the complexity of the circuits and of the associated electromechanical devices, and to programming complexity. Computers and buses may be designed on the one hand to be reliable, or, on the other hand, may be designed to be fault tolerant. In a reliable computer system, faults are detected and operations suspended while the fault is diagnosed and the system is reconfigured to remove the faulty component. In a fault tolerant computer system, redundancy is designed into the system in such a manner that if a component fails, a redundant component is able to take up where the failed component left off without any perceptible delay. Fault tolerant design greatly increases system cost and complexity.
Apart from data lines, which may be parity protected, all buses have control lines, errors on which cause unwanted behavior. Most buses are not designed for fault resilience and simply tolerate the possibility of undetected errors on a small number of signals that have no error detection. In bus based fault resilient systems, a number of different solutions to the problem of detecting control signal errors have been employed. Voting between multiple sets of lines may be used to provide both error detection and correction. A protocol may be designed that does not have any signals that cannot be parity protected. A side band signal may be used to compute line status between checking agents. Another approach involves time based check pointing in which a signature register is check periodically to confirm consistent operation. Each of these measures is relatively complicated and costly.
Despite such complexity, the reliability of existing buses is compromised by the potential for single undetected points of failure. A need therefore exists for a high-reliability SMP backplane bus that is simpler than but offers comparable performance to existing buses.