1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for replacing a failing physical processor in a computer supporting multiple logical partitions.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
One area in which advances have been made is in parallel processing of many threads of execution in partitions assigned their own resources and running separate operating systems. The shift in computer hardware and software to a highly parallel, logically partitioned model has provided opportunities for high system availability that were practically nonexistent just a few years ago. One mechanism for maintaining high availability permits dynamic runtime replacement of a processor predicted to fail with an unused processor provided the failing physical processor can continue to function long enough to complete the replacement process. Another mechanism for high availability maintains complete processor state information such that, even in the event of a catastrophic processor failure (e.g., a checkstop), the work a processor is performing can continue on a replacement physical processor. The importance of a replacement physical processor in these recovery mechanisms is readily apparent. It is clear where to acquire these replacements if unused processors are available. Unused processors, however, are generally wasteful and expensive and, as a consequence, are rare on most systems. When a processor checkstops and no unused processors are available a system has two choices: it can terminate the partition or pool to which the failing processor is assigned or the underlying hypervisor can run the partition or pool of virtual processors to which the failed processor is assigned as though the partition or pool has more processors than are physically available. Both these cases are undesirable; in the former the partition is dead, and in the latter, the partition is not running at desired performance levels. Certainly, some partition on the system must suffer if a utilized processor checkstops, but letting chance select the partition to suffer based on a random failing physical processor is not an optimal procedure.