As computer systems have become more complex, with large numbers of processing devices and other hardware resources, it has become possible for one such computer system to operate simultaneously as multiple computers, where each computer has its own operating system. Such is the case in many server computer systems in particular. In such systems, although a customer (or operating system) may perceive a single computer, the portion of the system running as this single computer (a “partition”) may be distributed across many different hardware resources that are unaffiliated with one another and/or in any are separately replaceable “Field Replaceable Units” (FRUs).
Today's customers are asking for computer systems that will allow them to increase their return on their investment by improving the utilization of their compute infrastructure. In addition, they are asking for solutions with higher availability, serviceability and manageability. In particular, they are asking for solutions that allow them to be able to replace failing components of a computer system without bringing down or rebooting the computer system. Yet with respect to conventional computer systems such as those discussed above it often is difficult or impossible to shift the utilization of hardware resources, or to replace hardware resources, without bringing down or rebooting the computer systems or at least individual partitions of the computer systems.
One reason why it is difficult to shift the utilization of hardware resources, or to replace hardware resources, without bringing down/rebooting a computer system is that such hardware resources tend to be in close communication with the operating system from the standpoint of interrupt handling. For example, in conventional cellular server architecture, interrupts are sent to processing units via an addressing mechanism in which unique system-wide addresses are ascribed to unique processing devices. According to this mechanism, the addresses of the processing devices are exposed directly to and used by the operating system. Because the addresses are unique and cannot be changed, and because the operating system is directly informed of all of the addresses, it is difficult if not impossible to modify or replace the processing devices without entirely stopping operation of the operating system. Further complicating the matter is that some operating system versions additionally have internal constraints that prevent the on-line deletion of some particular processing devices.
Given these difficulties, full software-level machine virtualization is used to enable shifting (migration) or replacement of hardware resources without having to take down and/or reboot the computer system. However, while such virtualization is possible, the use of full machine virtualization tends to result in lower performance (e.g., in terms of processing speed), fails to provide electrical isolation, and is tied to specific operating systems, which in turn raises consistency and support issues as the operating systems are updated.
For at least the above reasons, it would be advantageous if an improved method and system for handling interrupts could be developed that, in at least some embodiments, was consistent with the shifting and/or replacement of hardware resources such as processing devices within a computer system. Further, it would be advantageous if in at least some embodiments such improved method and system for handling interrupts was consistent with the shifting/replacement of hardware resources in a manner that did not require bringing down/rebooting of the overall system (or system partition).