1. Technical Field
The present invention relates generally to an improved data processing system, and in particular to a method, system, and computer product for handling errors in a data processing system. Still more particularly, the present invention provides a method, system, and computer product for using an alternative path to capture failure data from input/output (I/O) drawers.
2. Description of Related Art
A multiprocessor data processing system is a data processing system that contains multiple central processing units. This type of system allows for logical partitioning in which a single multiprocessor data processing system may run as if the system were two or more independent systems. In such a system, each logical partition represents a division of resources in the system and operates as an independent logical system. Each of these partitions is logical because the division of resources may be physical or virtual. For example, a multiprocessor data processing system may be partitioned into multiple independent servers, in which each partition has its own processors, main storage, and input/output devices.
Many systems include multiple remote input/output subsystems in which each subsystem includes a bridge or some other interface to connect the subsystem with other portions of the data processing system through a primary or main input/output hub. Each of these remote I/O subsystems is also referred to as a “RIO drawer”. Each of these RIO drawers may include peripheral components, such as, for example, hard disk drives, tape drives, or graphics adapters.
RIO drawers are typically physically separated from the processors and memory components of the computer. The RIO drawers and their components are connected to the main computer using RIO network cables which allow the I/O devices contained within the RIO drawers to function with the remainder of the computer as if they were on the system bus.
Some existing systems, such as the IBM eServer pSeries Regatta 690 and the IBM eServer pSeries and iSeries Squadrons systems, products of International Business Machines Corporation in Armonk, N.Y., do not have JTAG access to the RIO drawers. Instead, these systems use RIO cables to access the remote I/O drawers. RIO links are used to connect the central electronics complex (CEC) to a host of I/O devices. These links provide communication paths from the processors in the CEC to the I/O drawers. When an I/O error occurs, a kernel debugger (KDB) or hypervisor may only access the I/O failure information through the RIO cables.
A problem with having the access to the I/O drawers only allowed through the RIO cables is that if an I/O error occurs in the drawers and the RIO path is not functional, it may be difficult or even impossible to access the register information in the remote I/O drawers. As a result, the system of CEC may not be able to read all of the required registers to make a complete diagnosis of the I/O failure, as there is no way to dump ring buffer data from the chips on the I/O drawers. This ring buffer data may provide a hardware or software developer with needed data to diagnose a field failure. In addition, when using the KDB/hypervisor RIO path to read the I/O drawers, a read to an invalid address in the I/O drawer causes the KDB session to fail, and may cause the entire system to fail as well. Thus, if an I/O error occurs in the RIO drawers in a system that does not have JTAG access, the system using only RIO links may not be able to read all of the required registers to make a complete diagnosis of the problem, and, if it attempts to do so, may result in a checkstop system. A system developer must therefore be careful of generating an illegal address.
Therefore, it would be advantageous to have an improved method, system, and computer product for aiding in the debugging of an I/O failure.