Hypervisor-based systems may execute multiple operating systems within through multiple guest partitions. The guest partitions share access to hardware in the hypervisor-based system. The hardware in the hypervisor-based system may be designed to provide robust error reporting. Conventionally, errors reported by the hardware are either correctable or uncorrectable. Correctable errors are treated as warnings. The uncorrectable errors are handled differently, based on whether they are non-fatal or fatal. Non-fatal errors are serious, but the system may handle these errors through other means, such as redundant paths. Fatal errors are errors that affect the integrity of operations in the system, and may cause serious system reliability issues if the system continues to operate without taking a corrective action.
When error reporting is enabled in a single partition system, a fatal uncorrectable error may deliberately cause the system to reboot or shut down to prevent further unintended damage, such as data corruption. However, a system shutdown is not a desirable approach in a hypervisor-based system having multiple partitions residing in a single system. When a system shutdown occurs, all guest partitions on the system become unavailable. Thus, an error in one guest partition results in unavailability of all guest partitions. Maintaining reliability of the hypervisor-based system may be difficult in these circumstances. For example, additional planning may be required to ensure critical software does not execute on guest partitions sharing hardware with unreliable software. Such planning may be difficult, because guest partitions generally cannot access other guest partitions in the hypervisor-based system.