The present invention relates generally to input/output operations in a computer system, and more particularly to fault isolation in a peripheral component interconnect (PCI) structure.
In many computer systems, support of peripheral devices, such as hard disk drives, speakers, CD-ROM drives, etc., occurs through a standard I/O (input/output) device architecture called Peripheral Component Interconnect (PCI). The PCI architecture supports many complex features, including I/O expansion through PCI-to-PCI bridges, peer-to-peer (device-to-device) data transfers between controlling devices, i.e., masters, and responding devices, i.e., targets, as well as multi-function devices, and both integrated and plug-in devices.
The PCI architecture also defines standards for the detection and capture of error conditions on a PCI bus and in the devices. While the standard facilities provide error capture capabilities, the number of failure scenarios that may occur is large given the wide range of features allowed by the PCI architecture. Thus, isolating faults to a specific failing component becomes very difficult.
For example, for each transaction that occurs on the PCI bus, there is a master device which controls the transaction, and a target device which responds to the master""s request. Since data can flow in either direction (i.e., the master can request a read or write), it is important to know which device was the sender of bad data and which device was the receiver. Also, since errors can flow across PCI-to-PCI bridges, it is important to know whether the fault is located on the near or far side of the bridge.
Accordingly, a need exists for a failure isolation technique that would operate successfully for the numerous options supported by the PCI architecture, while providing consistent diagnostic information to servicers across a wide variety of hardware platforms.
The present invention meets this need and provides method and system aspects for fault isolation on a PCI bus. In a method aspect, a method for isolating a fault condition on a bus of a computer system, the computer system including an input/output (I/O) subsystem formed by a plurality of I/O devices communicating via the bus, includes categorizing, in a recursive manner, the I/O subsystem, and isolating a source of an error condition within the I/O subsystem. Further, the I/O subsystem communicates via a peripheral component interconnect, PCI, bus.
In a further method aspect, a method for fault isolation for bus errors includes the steps of (a) processing a device error on a PCI bus, and (b) performing ordered categorization of a plurality of input/output devices coupled to the PCI bus. The method further includes (c) determining whether the device error originates from a subordinate branch of the PCI bus, and (d) recursively performing steps (a)-(c) until the PCI bus is categorized.
In a system aspect, a computer system for isolating a fault condition on a bus includes a processing mechanism, and an input/output mechanism coupled to the processing mechanism. The input/output mechanism comprises a plurality of input/output devices and bridges coupled to a PCI bus and communicating according to a PCI standard. In addition, the system includes a fault isolation mechanism within the processing mechanism for identifying a source of an error condition in the input/output mechanism. Further, the fault isolation mechanism performs categorization of the input/output mechanism in a recursive manner.
With the present invention, a fault isolation technique successfully provides more specific identification of an error source in a PCI bus architecture. The fault isolation technique greatly reduces the ambiguity of error occurrence when the numerous options supported by the PCI architecture are utilized in a given system. Further, by relying on the standard features of the PCI architecture, the fault isolation technique is readily applicable to varying system arrangements to provide versatile application. These and other advantages of the aspects of the present invention will be more fully understood in conjunction with the following detailed description and accompanying drawings.