1. Technical Field
The present invention relates generally to an improved data processing system and in particular to a method and apparatus in a symmetrical multiprocessing system. Still more particularly, the present invention provides a method and apparatus for deconfiguring a processor in a symmetrical multiprocessing system.
2. Description of Related Art
With the need for more and more processing power, symmetrical multiprocessing (SMP) systems are being used more often. SMP is a computer architecture in which multiple processors share the same memory, containing one copy of the operating system, one copy of any applications that are in use, and one copy of the data. SMP reduces transaction time because the operating system divides the workload into tasks and assigns those tasks to whatever processors are free.
SMP systems often times experience failures. Sometimes these failures are so-called hard or solid errors, from which no recovery is possible. A hard error in a SMP system, in general, causes a system failure. Thereafter, the device that has caused the hard error is replaced. On the other hand, a number of failures are repeatable or so-called soft errors, which occur intermittently and randomly. In contrast to a hard error, a soft error, with proper recovery and retry design, can be recovered and prevent a SMP system from failing. Often these soft errors are localized to a particular processor within a SMP system.
A flaw in the semiconductor and computer hardware usually causes an intermittent (soft) error. A flaw degrades over times and becomes a hard or solid error. Therefore, some recoverable soft errors, which are localized or internal to a particular processor within the SMP system, will degrade over time to a hard error and cause a system failure.
Consequently, it would be advantageous to have a method and apparatus for identifying or predicting degradation of a processor in a SMP system and to remove such a processor from the system configuration of the SMP system.
The present invention provides a method and apparatus in a multiprocessor data processing system for managing a plurality of processors. Monitoring for recoverable errors in a set of processors is performed. Responsive to detecting a recoverable error for a processor in the set of processors, a determination is made as to whether the recoverable error indicates a trend towards an unrecoverable error. Responsive to a determination that the recoverable error indicates a trend towards an unrecoverable error, actions are initiated to stop the processor.