1. Technical Field
The present invention relates generally to an improved data processing system, and in particular a method and apparatus for handling errors. Still more particularly, the present invention provides a method and apparatus for recovery of partitions terminated in a logical partitioned system in which an error has occurred.
2. Description of Related Art
A logical partitioned (LPAR) functionality within a data processing system (platform) allows multiple copies of a single operating system (OS) or multiple heterogeneous operating systems to be simultaneously run on a single data processing system platform. A partition, within which an operating system image runs, is assigned a non-overlapping subset of the platform's resources. These platform allocable resources include one or more architecturally distinct processors with their interrupt management area, regions of system memory, and input/output (I/O) adapter bus slots. The partition's resources are represented by the platform's firmware to the operating system image.
Each distinct operating system or image of an operating system running within the platform is protected from each other such that software errors on one logical partition cannot affect the correct operation of any of the other partitions. This is provided by allocating a disjoint set of platform resources to be directly managed by each operating system image and by providing mechanisms for ensuring that the various images cannot control any resources that have not been allocated to it. Furthermore, software errors in the control of an operating system's allocated resources are prevented from affecting the resources of any other image. Thus, each image of the operating system (or each different operating system) directly controls a distinct set of allocable resources within the platform.
Currently, in LPAR data processing systems, when an unrecoverable host bridge error occurs, up to four partitions are terminated if the four input/output (I/O) slots under this host bridge are allocated to more than one partition. These partitions remain in an error state and cannot be rebooted until the LPAR data processing system's AC power is recycled. LPAR data processing systems are often used as servers, such as web servers, to provide services on the Internet or as application servers to provide services within an organization. Thus, such a situation is undesirable because of interruptions in services being provided by the LPAR data processing system.
Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for recovering from errors, such as those in a host bridge.