Logical partitioning is a means for logically organizing the resources of a data processing system's processor complex (i.e., a plurality of central processors) such that a plurality of logical (i.e., virtual) processors or partitions result. Logical partitioning allows a user to get the maximum use of a physical resources of a machine by allowing a plurality of partitions to run on the same processor complex simultaneously.
A logical partition is a user-defined set of hardware resources (e.g., processors, memory, channels, etc.) that when combined are sufficient to allow a system control program (i.e., an operating system) to execute. A logical partition can be defined to include:
(1) one or more central processors; PA1 (2) central storage; PA1 (3) one or more channel paths; PA1 (4) optional expanded storage; PA1 (5) one or more optional vector facilities; PA1 (6) sub-channels; and PA1 (7) logical control units. PA1 (1) DISPATCH INTERVAL: The dispatcher maintains a maximum time interval for which a logical processor may run for any single dispatch. PA1 (2) EVENT DETECTION: When a partition enters a WAIT state, the dispatcher detects this event and dispatches another partition to run. PA1 (3) PRIORITIES: Each partition has a user defined priority which is taken into account for dispatching. PA1 (4) I/O PREEMPTION: When an I/O interrupt is pending for a logical processor of a partition having a higher priority than the active partition, then the dispatcher will preempt the lower priority partition. PA1 (5) OPERATING SYSTEM INDICATED END: An operating system may determine that its processes are unproductive in the LPAR environment and may voluntarily give up its dispatch interval.
Logical partitions operate independently, but may share processors and I/O (input/output) devices. The storage for each logical partition is isolated, and cannot be shared by logical partitions. Similarly, channel paths are assigned to logical partitions and may not be shared by two logical partitions at the same time. Each partition is isolated from all other partitions by the partitioner's hardware and micro-code. The only communication available between partitions is via I/O connectivity. Additional background on partitioning is available in T. L. Borden, et al., "Multiple Operating Systems on One Processor Complex," IBM Systems Journal, vol. 28, no. 1, 1989, pages 104-123.
Partitions which share processors are known as shared partitions. Partitions which have only dedicated processors are known as dedicated partitions. Since the resources of a dedicated partition are not shared, these partitions do not require workload management. The following discussion relates to workload management between shared partitions, therefore, when "partition" is used, a shared partition is meant.
The Processor Resources Systems Manager (PR/SM) feature available on the IBM (International Business Machines, Inc., Armonk, N.Y.) 3090E and ES/3090S processor families allows logical partitioning (LPAR). PR/SM consists of hardware and micro-code which allow direct partitioning through the system console or indirect partitioning via software control. Additional information on PR/SM is available in IBM ES/3090 Processor Complex: Processor Resource Systems Manager (IBM Publication No. GA22-7123).
The partitioner (also known as a hypervisor) is the entity which provides partition access to shared processors. The portion of the partitioner which performs the switching between partitions is known as a partitioner dispatcher. The partitioner dispatcher is similar to the dispatcher of an operating system which time shares a processor between several applications. As used herein, "dispatcher" refers to the partitioner dispatcher rather than an operating system's dispatcher.
The partitioner performs dynamic workload management by allocating the logical processors of each partition to the available physical processors in a way which provides good I/O responsiveness while maximizing the use of physical processors. Dispatching may be done on a strictly time basis (time-driven dispatching), on an event driven basis (event-driven dispatching), or on a combination time and event basis (time/event-driven dispatching).
In time-driven dispatching, the dispatcher will switch the control of a shared processor after a fixed slice or period of time (e.g., 15 milliseconds). The time sharing basis is maintained regardless of the state of the current partition. For example, the current partition will remain active even though it has entered a WAIT state.
In event-driven dispatching, the dispatcher is allowed to time-slice from one partition to the next whenever the current partition no longer needs the shared processor (e.g., when the partition goes into a WAIT state). In time/event-driven dispatching, the dispatcher will switch the control of a shared processor after a fixed slice or period of time, however, an event (e.g., a WAIT state) occurring prior to expiration of the time slice will also trigger the dispatcher to switch control to the next partition. IBM's PR/SM LPAR uses this latter mode of dispatching.
In addition to considering time [(1) below] and events [(2) below], LPAR uses several additional conventional dispatching (scheduling) techniques:
As discussed above, the partitioner seeks to maximizing the use of the physical processors of the processor complex while still maintaining good I/O responsiveness. For example, a processor complex having a single partition would operate at 50% efficiency running an application with a ten millisecond run time followed by ten millisecond WAIT state. In contrast, the same processor complex configured with two partitions, could theoretically have 100% efficiency if two such applications were interleaved.
Note, however, in this example that if the two partition system is run with a dispatch time interval of twenty milliseconds, the partitioned system would still have only 50% efficiency. (Event-driven switching would yield the desired 100% efficiency.) Similarly, dispatching with a time interval of one-half millisecond would avoid the inefficiency caused by the WAIT states, however, the dispatcher overhead would create serious inefficiencies of it own.
Thus, while a partitioner seeks to achieve optimal performance from the processor complex, the dynamic nature of the processor's load will effect the efficiency attained. As can be seen from the above examples, the dispatching mode must be carefully tailored to the specific application. Nonetheless, even with a carefully tailored system, the dispatcher may cause gross inefficiencies for certain applications.
One such application is provided by way of example. A program such as IEBCOPY (an IBM utility program for defragmenting magnetic storage disks) may have I/O operations which are run in parallel with the main program. These I/O operations are handled by a data channel while the main program continues to run on the physical processor. IEBCOPY will initiate the I/O operation by constructing a channel program to run on the data channel. The I/O operation may then execute simultaneously with IEBCOPY.
When the channel program requires additional instructions, it signals the logical processor with a PCI (Program-Controlled Interrupt). If the PCI is handled in a timely fashion, then IEBCOPY can add additional instructions to the channel program, and it will continue to process the I/O. If, however, the PCI is not handled in a timely fashion, then the channel program will run out of instructions and terminate. If there are additional I/O operations to be serviced, IEBCOPY must construct a new channel program and initiate it with a start-channel operation. This start-channel operation involves a large amount of processor overhead as compared with a timely serviced PCI. Accordingly, it is desirable to service the PCI in a timely fashion, especially for an application which has a large number of PCI's.
With known dispatchers, such as LPAR, however, if an application running on the physical processor (e.g., IEBCOPY) enters a WAIT state, then the dispatcher will dispatch the next partition. Now when a PCI is signaled to the logical processor, the dispatcher is involved in deciding whether to service the interrupt by returning control of the physical processor to the logical processor (partition) which requested it. This intervention by the dispatcher creates a delay which often exceeds the critical window available for servicing the PCI. Each missed PCI will then have to be serviced with a start-channel operation. This results in a substantial performance degradation when IEBCOPY is run on a partitioned machine. Another potential performance problem confronted by known dispatchers involves the TPI (Test Pending Interrupt) instruction. TPI is a mechanism for handling a variable sized batch of potentially unrelated I/O interrupts with a single invocation of the "I/O Interrupt Handler" code. When the interrupt handler finishes processing the current I/O interrupt, it issues the TPI instruction which will present the highest priority queued I/O interrupt which needs to be processed next. In this way the batch of I/O interrupts are processed in a very efficient manner. However, if a high percentage of I/O interrupts are being processed via TPI, it means that I/O's are being delayed which can disrupt the overall processing of the partition's workload. In this case the partitioner should consider improving the partition's ability to use the physical processor.
PR/SM LPAR has an option called WAIT COMPLETION which provides a partial solution to these problems. The WAIT COMPLETION option allows a user to change the dispatching technique. That is, if WAIT COMPLETION=YES is specified, then the dispatcher will not perform event detection. The result is that the dispatcher effectively becomes a strictly time-driven dispatcher.
The WAIT COMPLETION=YES option can be used to provide the time-critical processing required by applications such as IEBCOPY. When this option is specified, the dispatcher will be strictly time-driven such that when IEBCOPY enters a WAIT state, it will be ignored so that the physical processor will be available for the entire dispatch interval to service PCI's.
The WAIT COMPLETION=YES option, however, has several disadvantages. First, it requires operator intervention to be selected. Second, it operates for all partitions running on the processor complex such that the performance of other partitions which do not require time-critical processing will be reduced.
What is needed is a means to provide time-critical processing to a particular partition on a dynamic demand basis.