1. Technical Field
The present invention relates to error analysis in information processing systems. More specifically, it relates to isolation of faulty peripheral component interface (PCI) adapters on a PCI bus during input/output sub-system initialization.
2. Description of the Related Art
When a failure occurs on a PCI bus, after system start-up but before machine check handling has been enabled, it is desirable to automatically determine which adapter is responsible for the fault condition. This procedure is difficult because prior to enabling machine check handling, the error condition will checkstop the system. Since there is no scan out capability on the remote I/O drawers where the PCI devices are located, it is not possible to scan out error registers for interrogation. A conventional service procedure is based on treating every bus adapter as suspect. System configuration is modified to comprise its minimum configuration; and, thereafter each adapter card is sequentially tried until the failure occurs in that configuration.
Such a scheme for recreating an error condition in order to identify the faulty adapter is problematic. The procedure often induces additional errors due to physically plugging and unplugging adapter cards. Further, such a sequential procedure adds considerable time to any error repair scenarios.
Check pointing during system startup to determine faulty components is a procedure known in the art. Typically, in a check point procedure, a periodic copy of a program or the state of a computer system is made so that if a failure occurs, recovery can be initiated from the last saved checkpoint and restarted. This invention uses the concept of checkpoints to save the last known PCI address that was attempted to be accessed during the PCI configuration cycle to identify the probable source of failure. In addition, progress codes are presented by the initial program load read only storage (IPLROS) firmware to indicate the progress of the boot sequence. The progress code will indicate that the PCI bus was being configured and the checkpoint will be used to identify the probable source of the failure.
Commonly assigned co-pending application Ser. No. 08/829,088 entitled xe2x80x9cA Method and System for Fault Isolation for PCI Bus Errorsxe2x80x9d teaches a mechanism for identifying a source of an error condition in the I/O mechanism.
U.S. Pat. No. 5,815,647 to Buckland et al., provides a system which allows a user to identify which of a plurality of feature cards has issued an error signal.
IBM Technical Disclosure Bulletin, Vol. 37, No. 08, page 619, discloses a recursive algorithm for initializing error handling logic for a PCI system.
None of these references provides for saving an address indicator prior to accessing that address.
Thus, it is desirable to have a speedy, certain technique for identifying faulty components which prevent a system from completing system start-up and entering its diagnostic routines.
It is further desirable to isolate and diagnose errors in a manner that eliminates the possible introduction of further error conditions.
The present invention overcomes the shortcomings of the prior art by providing a shared mailbox space in memory for use by a service processor during PCI bus and adapter initialization sequence. The address of an adapter is placed in the shared memory space before an attempt to access that adapter is made. If an error occurs during the access attempt, the service processor retrieves the address saved in the shared mailbox and immediately performs its error isolation procedure for determining the slot at fault. In this way the adapter card causing an I/O subsystem failure, rather than the entire I/O subsystem, may be analyzed.