1. Technical Field
The present invention relates generally to an improved data processing system. Still more particularly, the present invention provides a method and apparatus for testing fault tolerant processing within a symmetrical multiprocessing system.
2. Description of Related Art
With the need for faster data processing systems, symmetrical multiprocessing (SMP) systems are being used more often. SMP is a computer architecture in which multiple processors share the same memory containing one copy of the operating system, one copy of any applications that are in use, and one copy of the data. These systems reduce transaction time because the operating system divides the workload into tasks assigned to available processors.
Like other data processing systems, SMP systems may experience failures. Some of these failures are so-called hard or solid errors, from which no recovery is possible. A hard error, in general, causes a system failure. Thereafter, the device that has caused the hard error is replaced. On the other hand, a number of failures are repeatable or so-called soft errors, which occur intermittently and randomly. In contrast to a hard error, a soft error, with proper recovery and retry design, can be recovered and prevent a system from failing. These soft errors are often localized to a particular processor within the SMP system. The SMP system usually has capabilities to detect and recover from certain hardware-related errors. However, given the increasing complexity of current data processing systems, especially multiprocessor systems, the permutations of possible errors in a failing system can be quite large. Thus, the design and test of system hardware, firmware and software for detecting and recovering from these errors is similarly complex.
Another layer of complexity is added by the fact that multiple hardware and software vendors collaborate in designing system components, and the procedures for testing the fault tolerance among the various hardware, firmware, and software components could be more efficient with the proper testing utilities.
Consequently, it would be advantageous to have a method and apparatus for simulating errors in a processor within a multiprocessor system in order to test its system design and fault-tolerant recovery capabilities.
A method and apparatus for simulated error injection for processor deconfiguration design verification is provided. A simulated error condition request is received from a user through software, such as the operating system executing in the multiprocessor data processing system. In response to the requested simulated error condition, an error condition is injected into a processor of the multiprocessor data processing system via instruction execution. In response to the detection of the error condition and execution of error-path code, a processor is deconfigured. The error condition may be injected by executing instructions to set an error condition bit in an error condition register.