The present invention relates to techniques of acquiring RAS (reliability/availability/serviceability) data useful for fault recovery of an information processing system with a bus bridge which is connected between a plurality of buses for data transfer therebetween, for example, between a primary bus and a secondary bus such as between a host bus and a PCI bus and between a processor bus and an I/O bus of a server system.
A main portion of a conventional information processing system having a bus bridge connected between a host bus and a PCI bus of a server system will be described with reference to FIG. 4.
Referring to FIG. 4, reference numeral 1 represents a host bus, and reference numeral 2 represents a PCI bus widely used with personal computers nowadays. Reference numeral 3 represents a bus bridge which is connected between the host bus 1 and PCI bus 2 to transfer data therebetween. This bus bridge 3 is constituted of a host bus control circuit 31 for controlling the host bus 1, a PCI bus control circuit 32 for controlling the PCI bus 2, and a host busxe2x80x94PCI bus I/F control circuit 33.
Symbol s11 represents a transfer start request signal which is used for sending a data transfer start request from the host bus 1 to the host busxe2x80x94PCI bus I/F control circuit 33. Symbol s12 represents a transfer completion signal indicating a data transfer completion. Symbol s13 represents a transfer address signal indicating a data transfer partner. Symbol s14 represents transfer data. Symbol s21 represents a transfer start request signal which is used for sending a data transfer start request from the PCI bus 2 to the host busxe2x80x94PCI bus I/F control circuit 33. Symbol s22 represents a transfer completion signal indicating a data transfer completion. Symbol s23 represents a transfer address signal indicating a data transfer partner. Symbol s24 represents transfer data.
In such an information processing system, data at some address on the PCI bus 2 requested from the host bus 1 side is read in the following manner. First, in response to the request from the host bus 1, the host bus control circuit 31 enables the transfer start request signal s11 to request the host busxe2x80x94PCI bus I/F control circuit 33 to transfer data at an address indicated by the signal s13. This request is received by the host busxe2x80x94PCI bus I/F control circuit 33 and thereafter executed for the PCI bus 2. After the data transfer is completed by the PCI bus 2, the host busxe2x80x94PCI bus I/F control circuit 33 sends the transfer completion signal s12 to the host bus control circuit 31 to terminate the data transfer.
A data transfer start request from the PCI bus 2 side to the host bus 1 is executed in the following manner. Namely, in response to the request from the PCI bus 2, the PCI bus control circuit 32 enables the transfer start request signal s21 to request the host busxe2x80x94PCI bus I/F control circuit 33 to transfer data at an address indicated by the signal s23. This request is received by the host busxe2x80x94PCI bus I/F control circuit 33 and thereafter executed for the host bus 1. After the data transfer is completed by the host bus 2, the host busxe2x80x94PCI bus I/F control circuit 33 sends the transfer completion signal s22 to the PCI bus control circuit 32 to terminate the data transfer.
In the conventional information processing system described above, the bus bridge 3 is provided with only a bus bridging function of transferring data between the primary bus and secondary buses, in this example, between the host bus 1 and PCI bus 2. Therefore, in order to acquire RAS data in the bus bridge 3 or RAS data of I/O devices and the like on the secondary bus or PCI bus 2 connected to the bus bridge 3, it is necessary to intercept the program presently run by a processor of the system and execute a RAS data acquisition program. It is not easy, therefore, to acquire RAS data.
Furthermore, if the program run by the processor enters an infinite loop or the processor falls in an inoperable state such as hang-up, the processor cannot execute the RAS data acquisition program and the RAS data acquisition itself is impossible.
RAS data is useful for facilitating to find the reason of an abnormal operation of a system. If RAS data is not easy to acquire or is impossible to acquire, it means that it takes a time to recover a fault of the system. This problem has long been desired to be solved.
The present invention has been made under the above-described circumstances and an object of the invention is to provide a RAS data acquisition circuit and an information processing system with this circuit capable of facilitating to find the reason of an abnormal operation of the system and easily recovering a fault of the system.
The above object can be achieved by an information processing system having a bus bridge connected between a plurality of buses for data transfer therebetween, wherein the bus bridge is provided with a RAS data acquisition bus operating independently from the plurality of buses and an RAS data acquisition unit for acquiring RAS data in the bus bridge or RAS data of a processor or an I/O device on a bus connected to the bus bridge, in response to a command supplied from an external circuit via the RAS data acquisition bus.
The information processing system is also provided with an information transmitting unit for transmitting the RAS data acquired by the RAS data acquisition unit to the external circuit via the RAS data acquisition bus.
RAS data of a processor or an I/O device on a bus is acquired via the RAS data acquisition bus which operates independently from a plurality of buses to and from which data is transferred via the bus bridge. It is therefore possible to acquire RAS data easily without intercepting a program presently run by a processor of the system, and to know the operation state of the I/O device in real time.
Further, the RAS data acquisition bus is provided outside of the bus bridge and the RAS data can be acquired without the help of the processor of the system. Therefore, RAS data can be acquired even if the program run by the system enters an infinite loop or the system falls in an inoperable state such as hang-up. Accordingly, it is easy to find the reason of an abnormal state of the system and to recover a fault of the system.