As computer systems have become more complex, with large numbers of processing devices and other hardware resources, it has become possible for one such computer system to operate simultaneously as multiple computers, where each computer has its own operating system. Such is the case in many server computer systems in particular. In such systems, although a customer (or operating system) may perceive a single computer, the portion of the system running as this single computer (a “partition”) may be distributed across many different hardware resources that are unaffiliated with one another and/or in any are separately replaceable “Field Replaceable Units” (FRUs).
Today's customers are asking for computer systems that will allow them to increase their return on their investment by improving the utilization of their compute infrastructure. In addition, they are asking for solutions with higher availability, serviceability and manageability. In particular, they are asking for solutions that allow them to be able to replace failing components of a computer system without bringing down or rebooting the computer system. Yet with respect to conventional computer systems such as those discussed above it often is difficult or impossible to shift the utilization of hardware resources, or to replace hardware resources, without bringing down or rebooting the computer systems or at least individual partitions of the computer systems.
One reason why it is difficult to shift the utilization of hardware resources, or to replace hardware resources, without bringing down/rebooting a computer system is that such hardware resources provide certain functional resources (for example, real-time counters) that the operating system(s) and/or partition(s) of the computer system tend to rely upon in order to work properly, and that can be referred to as “critical” resources. Because some or all of these critical resources are necessary or at least desirable for proper operation, in order to achieve successful shifting of hardware resources generally, these critical resources must also be shifted. Yet conventional computer systems, including many of today's cellular mid-range and high-end servers, face several limitations relating to the shifting of such critical resources.
More particularly, many OS-critical resources reside at architected addresses (such as the boot vector) that are “root resources”, which are described to the OS or abstracted from the OS by firmware interfaces. Because many conventional cell-based servers map these root resources to fixed physical paths leading to specific, fixed “root” cells, conventional operating systems running on such servers cannot handle the removal, loss or modification of the root cells, at least not without bringing down the partition(s) supporting those operating systems.
Further, in order for the shifting of such root resources at root cells to occur in a manner that would not require bringing down a partition, such shifting would need to happen in a manner that did not involve the operating system, such that the operating system was unaware of and not impacted by such shifting. Yet many conventional approaches for shielding an operating system from critical resources typically require full machine virtualization at a software level. Such virtualization can often result in lower performance (e.g., some cycles that could otherwise be given to the application are instead given to the process virtualizing the machine), and also may be inconsistent with providing electrical isolation and/or may be tied to specific operating systems or versions thereof.
For at least the above reasons, it would be advantageous if an improved method and system for shifting critical (or other significant or desirable) resources within a computer system could be developed that, in at least some embodiments, was consistent with the shifting and/or replacement of hardware resources such as processing devices within a computer system. Further, it would be advantageous if in at least some embodiments such improved method and system for shifting critical (or other significant or desirable) resources was consistent with the shifting/replacement of hardware resources in a manner that did not require bringing down/rebooting of the overall system or modifying the operating system (or system partition).