1. Technical Field of the Invention
The present invention relates to bus systems, and, in particular, to a fault-tolerant bus system adapted for use in a computer system.
2. Description of Related Art
Improved system performance is a much sought-after goal of paramount significance that has been vigorously pursued in the field of computer systems since its very beginnings. Two avenues have been particularly fruitful: modularization of functional subsystems and superior bus design. Both avenues have resulted in improved system performance. In personal computers, especially, modularization has resulted in a standardized motherboard having a processor unit, on-board memory, and a host of expansion slots into which are plugged various expansion cards providing such enhanced functionality as telecommunications, disk storage and improved video.
On the other hand, the goal of achieving ever-increasing performance criteria for computer systems has also mandated advanced bus design techniques. As is well-known in the art, computer system buses, having a plurality of conductive transmission lines, provide the means for interconnecting a plurality of electronic devices such that the devices may communicate with one another. These buses carry information including address information, control information, and data, in a logical manner as dictated by the design thereof.
This logical manner is commonly referred to as the bus protocol.
The computer system buses typically connect master devices such as processors or peripheral controllers, and slave devices such as memory components and bus transceivers. It should be understood herein that it is common in the art to also refer to slave devices as target devices, and accordingly, these two terms are used hereinafter interchangeably. In general, master devices are the initiators of a transaction involving information transfer across a bus to which they are interconnected. Master devices arbitrate to gain control of the bus and an arbiter is typically provided for resolving arbitration contention using one of several known techniques. On the other hand, slave devices typically operate in conjunction with at least one master device, responsive to control signals received therefrom.
As is well-known, high performance systems impose at least two design objectives in relation to buses: higher throughput of information and fault tolerance. Two approaches are typical in attaining the former objective: (i) increasing the bus transmission speed, and (ii) expanding the bus-width, that is, providing additional transmission lines. Fault tolerance may be understood as the property of a robust bus system wherein the negative effects of errors are minimized, if not eliminated. Fault tolerance may also refer to the capability of an extendable bus system that is operable concurrently with both non-extended-bus-compliant devices and extended-bus-compliant devices even during occurrence of such events as data transmission errors and detection of device-related faults upon initialization. Since inoperable or malfunctioning bus systems would generally cause a computer system in which they are placed to crash, it would be highly beneficial to have a bus system that is fault-tolerant in the sense that the bus system would continue to operate when a device-related fault is detected upon initialization or when an error is encountered during data transmission. Moreover, it can be appreciated that it would be advantageous in a computer system to monitor the occurrences of errors and faults for the purposes of system diagnostics and error recovery management. Accordingly, an efficient error reporting scheme is also a desirable design objective which would lead to improved system performance.
One of the existing high-performance buses is the 32-bit Peripheral Component Interconnect ("PCI") bus. As is well-known, the 32-bit PCI bus is also extendable to accomodate a 64-bit data path, thereby concurrently supporting both 32-bit-compliant devices and 64-bit-compliant devices. The conventional PCI bus provides several advantages such as, for example, high performance, low cost, ease of use, and high reliability.
In spite of these well-known advantages, it is understood that error reporting and management is highly system dependent for a conventional 32-bit PCI bus, thereby limiting the choices available to system designers. For example, their goal primarily is to handle recoverable errors at the hardware level so as to minimize system downtime. On the other hand, because the 64-bit PCI bus is a relatively new development, there are at present no known solutions that combine efficient error reporting and fault-tolerant characteristics into a robust error management system therefor.