Embodiments of the inventive subject matter generally relate to the field of computer systems, and, more particularly, to selectable event reporting for virtualized partitioned systems.
As the number of partitions on highly virtualized large computing systems rises to the thousands, the task of notifying operating systems running on those partitions of errors that occur on the underlying platform hardware becomes more time and resource consuming. The number of events that may occur combined with the number of partitions that may be running in a highly virtualized system presents challenges on the service infrastructure for passing service events to each of the active partitions and in process partitions that will be activated within a given time window of the error occurrence. The hypervisor typically must save the error log and distribute it to any partition that is activated within a specified time window of the occurrence of the event. Further, most of the underlying hardware is virtualized to the partitions, so reporting underlying platform events to only affected partitions is generally not possible because the hardware resources are not owned by any specific partition, but are virtualized to all partitions. As a result, hardware events are typically reported to all active partitions.
The stress on system resources caused by having to report events to many partitions can be exacerbated when the partitions are configured to report the events to a management console. In such cases, each of the partitions receiving the event also reports the event to a management console. Thus there can be thousands of partitions all reporting the same event to a management console, resulting in greatly increased overhead in processing the event. This is important because if the service infrastructure is busy performing event notification, then there can be delays processing other tasks normally performed by the partition or delays in processing exception system operation hypervisor requests.