The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely sophisticated devices, and computer systems may be found in many different settings. Computer systems typically include a combination of hardware, such as semiconductors and circuit boards, and software, also known as computer programs. As advances in semiconductor processing and computer architecture push the performance of the computer hardware higher, more sophisticated and complex computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago. One significant advance in computer technology is the development of parallel processing, i.e., the performance of multiple tasks in parallel.
A number of computer software and hardware technologies have been developed to facilitate increased parallel processing. From a hardware standpoint, computers increasingly rely on multiple microprocessors to provide increased workload capacity. Furthermore, some microprocessors have been developed that support the ability to execute multiple threads in parallel, effectively providing many of the same performance gains attainable through the use of multiple microprocessors. From a software standpoint, multithreaded operating systems and kernels have been developed, which permit computer programs to concurrently execute in multiple threads, so that multiple tasks can essentially be performed at the same time.
In addition, some computers implement the concept of logical partitioning, where a single physical computer is permitted to operate essentially like multiple and independent virtual computers, referred to as logical partitions, with the various resources in the physical computer (e.g., processors, memory, and input/output devices) allocated among the various logical partitions. Each logical partition executes a separate operating system, and from the perspective of users and of the software applications executing on the logical partition, operates as a fully independent computer. The separate logical partitions typically operate under the control of a partition manager or hypervisor.
As the logical partitions execute, they encounter various events, e.g., errors due to software, firmware, hardware, or network problems. The type of these events may range from expected and benign to unexpected and serious, where an event that requires some sort of intervention, e.g., by a system administrator or technician, is often referred to as a “serviceable event.” Further, some of these events, called local events, may be local to one particular partition and not encountered by any other partition. But, other events, called platform serviceable events, may be global and capable of being encountered by all partitions.
One current technique for handling events in a logically-partitioned system is for all partitions to receive platform serviceable events from firmware/hardware and forward them to a central aggregation component, called a hardware management console. These events may also be reported directly from the platform firmware or hardware to the hardware management console. This dual reporting provides a redundant path in order to guarantee delivery of the events, in case the path from the platform firmware/hardware to the hardware management console is lost or temporarily unavailable. Also, the partitions forward to the hardware management console the serviceable events that are local to the partitions. Thus, the hardware management console becomes the aggregation point for all serviceable events in the computer system. A drawback of this technique is that as the number of partitions increases, the number of redundant paths for reporting platform events to the hardware management console also increases. The impact on the hardware management console's and/or the computer system's performance may become significant as the number of partitions and the number of events recorded in the hardware management console escalate.
What is needed is a better technique for handling serviceable events while still allowing for some redundancy.