1. Technical Field
The present invention relates generally to the field of computer architecture and, more specifically, to methods and systems handling input and output bus errors.
2. Description of Related Art
By definition, a logically partitioned (LPARed) system is one in which multiple operating systems (OSs) or multiple instances (multiple copies of the OS loaded into memory) of the same OS can be running on the system simultaneously. It is a requirement that all errors, both hardware and software, be isolated to the partition or partitions that are affected by the particular error.
For input/output (I/O) subsystems, this requirement can be tricky, since I/O bus architectures are not designed to isolate their errors between I/O adapters (IOAs) such that one IOA does not xe2x80x9cseexe2x80x9d errors occurring on a different IOA. Thus an error occurring in a single IOA may cause an error that cannot be isolated, with existing architectures, to one single partition. For example, for peripheral component interconnect (PCI) buses, if one IOA activates the System Error (SERR) signal on the bus, which is a generic System Error signal that is used to signal an event to the system that cannot be handled by the device or the device driver, it is indistinguishable as to which IOA activated the signal since it is a shared signal. In such situations where the error is not isolated, the hardware has to ensure that all partitions see the same error. However, this requirement is contrary to the definition and intent of logical partitioning.
To make matters worse, in a tree-structured system such as a PCI bus, errors can propagate up the tree. Also, the store and forward nature of the write operations can cause a non-recoverable error that needs to be isolated to a particular partition.
One solution that addresses the PCI problem is to assign all IOAs under one PCI Host Bridge (PHB) to one single LPAR partition. However, this results in a granularity that is not very usable by the user. Ideally, the user should be able to assign the IOAs in each individual slot to a different partition, regardless of which PHB the IOA falls under. Therefore, a method and system that allows for isolation of errors generated by one IOA preventing them from affecting a partition other than the partition to which that IOA is assigned, is desirable.
The present invention provides a method, system, and apparatus for isolating an input/output (I/O) bus error, received from an I/O adapter, from the other I/O adapters that may be in different partitions within a logically partitioned data processing system. In one embodiment, the logically partitioned data processing system includes a system bus, a processing unit, a memory unit, a host bridge, a plurality of terminal bridges, and a plurality of input/output adapters. The processing unit, memory unit, and the host bridge are all coupled to each other through the system bus. Each of the plurality of terminal bridges is coupled to the host bridge through a first bus. Each of the input/output adapters is coupled to one of the plurality of terminal bridges through a one of a plurality of second buses, such that each input/output adapter corresponds to a single terminal bridge. Each of the input/output adapters are assigned to one of a plurality of logical partitions within the data processing system. Each of the terminal bridges isolates errors received from a respective one of the input/output adapters from other input/output adapters, some of which may be within a different one of the plurality of logical partitions.