1. Technical Field
The present invention relates in general to maximizing fault isolation during computer system initialization and in particular to a system and method for system initialization to maximize fault isolation using JTAG.
2. Description of the Related Art
Analyzing and debugging errors and failures are often difficult to accomplish in large, complex computer systems such as in International Business Machine""s (IBM""s) RS6000 workstation. Such complex systems are so widely distributed with numerous key chips, components, and sub-systems that a failure or error that has occurred in one chip, component, or sub-system of the computer system is not realized or recognized by other chips, components, or sub-systems in the computer system. Debugging becomes particularly difficult in these complex computers when the error or failure occurs during the power up or initialization phase of the computer system as not all isolation mechanisms are enabled yet.
The entire computer system is not promptly or immediately stopped or halted when such failure(s) or error(s) has occurred during the system initialization. Thus, the computer system may continue to initialize, set up, operate, and execute even though an error or failure has occurred in at least one of the chips, components, or sub-systems of the computer system. Also, such present computer systems do not provide an easy way for identifying, locating, and debugging the error(s) or failure(s) that has or have occurred during the system initialization and the source of the error(s) or failure(s) at the time of system initialization.
Additionally, Joint Test Action Group (JTAG) architectures and features on chips are well known in the art. JTAG architectures and features provide a secondary or ancillary backdoor into the chips. Through JTAG architecture, access is provided to registers on the chip. These registers are used to stop clocks, enable and disable output drivers, and raise and lower fencing logic. In addition, JTAG is used to read error registers in the event of an error.
All key chips in such complex computer systems (i.e. RS6000 workstation) include bi-directional checkstop and attention logic, which are well known in the art. A checkstop is a fatal error that must be handled as quickly as possible. An example of such a fatal error is a parity error that triggers a checkstop to immediately handle the error. Other IBM systems have used checkstop to freeze all processor states in multiprocessor systems for each of the processors in the overall computer system. An attention may be a secondary error or a less serious condition that may not even be an error. The attention may not necessarily have to be handled as quickly as possible. If a secondary error or less serious condition has occurred, an attention is triggered by the respective processor, chip, component, sub-system, etc.
However, an attention or checkstop architecture has not been used for and during the initialization and set up process of an entire and overall computer system, particularly a complex computer system. Also, an attention or checkstop tree architecture for an entire and overall computer system does not exist wherein the attention or checkstop tree is able to be traversed and used to efficiently isolate and identify an error or failure and its location during system initialization. Thus, the initialization process and procedure of a complex computer system may be complicated or so involved such that detection of an error(s) or failure(s) (i.e. triggering of an attention or checkstop) may be impossible or impractical.
It is therefore advantageous and desirable to break down and set forth a sequence of steps for the initialization process or procedure of a complex computer system in order to accommodate detection of errors and failures and triggering of a respective attention or checkstop for and during the system initialization, set up, or power on. It is further advantageous and desirable to provide a system and method for maximizing fault isolation using JTAG during initialization or set up or power on of an entire and overall computer system, particularly a complex computer system.
It is therefore one object of the present invention to break down and set forth a sequence of steps for the initialization process or procedure of a complex computer system in order to accommodate detection of errors and failures and triggering of a respective attention or checkstop for and during the system initialization, set up, or power on.
It is another object of the present invention to provide a system and method for maximizing fault isolation using JTAG during initialization or set up or power on of an entire and overall computer system, particularly a complex computer system.
The foregoing objects are achieved as is now described. A sequenced initialization used for maximizing detection of errors and failures and triggering of respective attention signals. A number of computer devices each having a JTAG interface, an attention distribution sub-system, and a service processor are provided. Each computer device inaccessible during a built in self test (BIST) is coupled to an error register bit. Computer devices are reset. Functional clocks of computer devices are disabled. output drivers of computer devices are disabled. Fences for computer devices are put up so that the inputs are in a known state and computer devices driving the inputs have no effect. BISTs for computer devices are performed as they are released from reset. Determination of when BISTs are complete and if each BIST has passed is performed. The following tasks are performed via JTAG: Attention signals for computer devices are raised when BISTs are completed; Functional clocks for computer devices are started; Computer devices for system operation are configured when BISTs are complete; and output drivers of computer devices are enabled by using JTAG accesses to computer devices. Fences of computer devices are dropped after output drivers have been enabled. Computer devices are released from reset modes. A fault detection signal is triggered if a fault was detected by any computer device. A fault is driven to the attention distribution sub-system, and an attention sent out to all other computer devices by the attention distribution sub-system during the sequenced initialization. Service processor determines whether to continue with sequenced initialization depending on fault determined.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.