1. Technical Field
The present invention relates generally to an improved data processing system, and in particular to a method, system, and computer product for handling errors in a data processing system. Still more particularly, the present invention provides a method, system, and computer product for self-diagnosing remote input/output (I/O) enclosures with enhanced field replacement unit (FRU) callouts.
2. Description of Related Art
A multiprocessor data processing system is a data processing system that contains multiple central processing units. This type of system allows for logical partitioning in which a single multiprocessor data processing system may run as if the system were two or more independent systems. In such a system, each logical partition represents a division of resources in the system and operates as an independent logical system. Each of these partitions is logical because the division of resources may be physical or virtual. For example, a multiprocessor data processing system may be partitioned into multiple independent servers, in which each partition has its own processors, main storage, and input/output devices.
Many systems include multiple remote input/output (RIO) subsystems in which each subsystem includes a bridge or some other interface to connect the subsystem with other portions of the data processing system through a primary or main input/output hub. Each of these remote I/O subsystems is also referred to as a “RIO drawer”. Each of these RIO drawers may include peripheral components, such as, for example, hard disk drives, tape drives, or graphics adapters.
RIO drawers are typically physically separated from the processors and memory components of the computer. The RIO drawers and their components are connected to the main computer using RIO network cables which allow the I/O devices contained within the RIO drawers to function with the remainder of the computer as if they were on the system bus.
A service processor or partition may be used to detect any failures that occur in the remote drawers during a diagnostic test. When an error is detected, a service call is made which indicates each field replacement unit (FRU) that must be replaced in order to clear the error. For systems that offer JTAG access to the RIO drawers, the FRU callout may be performed using the JTAG links. However, some systems, such as the IBM eServer pSeries Regatta 690 and the IBM eServer pSeries and iSeries Squadrons systems, products of International Business Machines Corporation in Armonk, N.Y., do not have JTAG access to the RIO drawers. In these systems, RIO links are used to connect the central electronics complex (CEC) to a host of I/O devices. These links provide communication paths from the processors in the CEC to the I/O drawers. However, there are some chip registers on the I/O drawers, such as debug and performance registers, that are not accessible using the RIO links. Thus, if an I/O error occurs in the RIO drawers in a system that does not have JTAG access, the system may not be able to read all of the required registers to make a complete diagnosis of the problem. Consequently, a complete FRU callout to correct errors on the RIO drawer may not be possible. In addition, a diagnosis of the I/O failure also may not be obtainable if the system is in a checkstop state and the RIO link is broken.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for diagnosing failures on RIO enclosures with greater granularity to provide complete FRU callouts.