1. Technical Field
The present invention is generally related to logical partitions and access to devices, and more particularly to optimizing performance and availability of devices in partitioned systems.
2. Description of Related Art
A desirable attribute of a server is the ability to run multiple operating system images simultaneously on the hardware in autonomous partitions. In such an environment, each partition defined for the system owns some sub-partition of all the available system resources such as processors, memory and I/O cards.
Hard Partitioning
One way to accomplish this partitioning is to design a system as a collection of electrically isolated SMP modules. Each individual SMP module would have the ability to run a separate instance of an operating system. Each SMP module running an operating system would be known as a partition. In addition SMP modules could be aggregated together to create larger partitions. This kind of partition is usually known as hard partitioning.
In a hard-partitioned server, certain resources of the system might be shared among all the partitions (e.g. power and cooling) while the rest would operate independently. A key feature of this environment is that a failure of a non-shared resource would generally only impact one partition.
On the other hand, if hardware partitioning is accomplished by aggregating one or more SMP modules in a system, and each module has, for example, 4 processors, then a hard partition can contain only 4, 8, 12 or some other multiple of 4 processors. The granularity doesn't exist to create a 5 or 10 processor partition. Other elements such as memory and possibly I/O are likewise limited.
FIG. 1 illustrates some hard partitioning possibilities of a hypothetical system where the minimum partition granularity is four processors with 4 memory cards and one I/O drawer. In the illustration, “p” stands for a processor, “m” stands for a unit of memory and “i” stands for an I/O card.
Logical Partitioning
Another approach to partitioning, such as found on pSeries servers from IBM, is known as “logical partitioning.” In this method, a firmware layer called a hypervisor runs beneath the operating systems. The hypervisor controls all of the system resources, and can allocate them to individual partitions without the hardware limitations of physical partitioning. FIG. 2 below gives an illustration of this.
This approach allows the assignment of single processors, memory elements and I/O cards to individual partitions. By design it is much more flexible than the physical partitioning approach.
While not being limited to it, it should be noted that this approach, to a large degree, still permits logical partitions to be defined along physical boundaries to take advantage of the availability characteristics that electrical isolation of the physical boundaries afford.
Software Partitioning
An additional approach to partitioning is a concept known as “software partitioning” or “virtual partitioning.” In this approach, software running in an existing hard partition can be used to host separate virtual partitions each running their own separate image of the operating system. This approach, with some additional overhead, allows for greater granularity of partitions within a hard partition. However, the availability characteristics of the soft partition from a hardware standpoint are generally limited to the availability characteristics of the hard partition. In other words, a fatal hardware error anywhere with the hard partition will cause all of the software partitions under it to fail.
Hardware Availability Considerations with Logical Partitioning
As in the physical partitioning case, it is still generally true that resources shared by more than one partition can cause all the partitions sharing the resource to be compromised if a hardware failure occurs.
For a large number of components in a system, logical partitioning allows for a choice of how much hardware is shared. Understanding the implications of the options available is important in deciding how partitioning should be accomplished.
This illustration of FIG. 3 shows a CEC containing a number of processors being attached to an I/O drawer which contains two I/O planars. Each I/O planar effectively has 12 Slots.
Each group of 4 slots in a planar connects through a PCI-PCI bridge, to a function called a PHB. All the PHBs in turn are connected to the CEC by what is labeled a “Hub Port.”
Because each I/O card has a separate path to the PCI-PCI bridge many failures that could occur with an I/O card will be isolated to that card only. In addition, pSeries servers from IBM have the concept of a PCI Enhanced Error Handling function (EEH.) This function, available for many I/O cards avoids a hardware machine check when parity errors occur on the PCI bus between an I/O card and the PCI-PCI bridge. Even with EEH, however, there are still some failures that could occur between the I/O card and the PHB which would require termination of all partitions with I/O cards under the PHB. In addition there are some failures that would require termination of all partitions under the entire I/O planar.
In short, for higher availability in logical partition, the logical partition should have exclusive ownership of all the PHBs containing any I/O card of the partition. Although of somewhat less importance, the logical partition should also have exclusive ownership of all the I/O planars containing any I/O card of the partition. In other words if any I/O card under a PHB is assigned to a partition, then all of the I/O cards under the PHB should be assigned to the partition and for further availability if any I/O card under an I/O planar is assigned to a partition, then all the I/O cards under the planar should be assigned to the partition.
Performance Vs. Availability in I/O Assignment
The last section defined rules for assignment of I/O cards in logical partitions to achieve the best availability. It would seem simple to follow the stated rules when the best availability for a partition is required.
However, from an efficiency standpoint the rules would tend to bunch all of the I/O cards of a partition under the least number of PHBs and I/O planars possible. There are two considerations that argue against that.
First is that configurations could be designed with redundant cards in a partition to allow for failover from one card to another on an unrecoverable error. To take full advantage of the redundancy, the I/O cards in question should really be in separate I/O planars. This would allow fail-over for any type of I/O error on the entire planar (although a reboot may be required.) This presumes, of course, that all necessary cards are redundant for the partition.
Second is that from a performance standpoint, it is a good idea to spread high performance cards out among the available PHBs and HUB ports and to limit the number allowed under a PHB, planar or drawer. These can be described as performance rules that should be met.
In addition there are slot restrictions limiting where I/O cards can be plugged in to a system having to do with matching voltage and speed of the PCI cards. This are functional rules that have to be met.
Given an unlimited number of I/O drawers to work with, all of these performance, function and availability rules could be satisfied in a system.
However, it is also desirable to limit the amount of I/O drawers in a system both because there is a limit to the number a system can support and practically because adding I/O drawers to a system adds cost to the system.
Therefore, satisfying all of the performance, function and availability rules for a system while minimizing the number of I/O drawers becomes an optimization problem.
When the first logically partitionable servers were shipped for pSeries, rules were published that, when followed, would insure a high level of card performance and maintain the functional requirements.
If a system were ordered with a given set of I/O cards and a number of I/O drawers, IBM would manufacture those systems using those guidelines without giving any consideration to availability in a partitioned environment.
IBM did publish high availability guidelines stating that for best availability, no two partitions should share a PHB and no two partitions should share an I/O planar.
These guidelines did not in any way tell the customer how to configure accordingly while still honoring the other placement rules.
Actually evaluating configurations was, perhaps, believed to be simple enough that customers could manage it manually. In practice, evaluation turned out to be far more tedious than imagined and if customer were aware of the tradeoffs, they might settle for buying more I/O drawers than necessary or configuring with less availability than would theoretically be possible.
Competitors still limited by hard partitioning (or hard partitioning with virtual partitioning) are not yet faced with this optimization problem so long as they do not attempt a logical partitioning approach.
Therefore, there is a need in the art for means of understanding how to place I/O cards physically into a server so that the logical partitions assigned to the server enjoy a high level of card performance, insuring that each partition maintains the highest level of availability, but minimizing the number of I/O drawers required to accomplish this.