Modern requirements for a computer system may require that a computer be utilized to run several operating environments, or operating systems, at once. In a typical embodiment, a single logically partitioned computer or data processing system can run a plurality of operating systems in a corresponding plurality of logical partitions (LPARs), also referred to as virtual machines (VMs). Each operating system resides in its own LPAR, with each LPAR allocated a part of a physical processor, an entire physical processor, or multiple physical processors from the computer. Additionally, a portion of the computer's memory is allocated to each LPAR. An underlying partition manager, often referred to as a hypervisor or virtual machine monitor (VMM), manages and controls the LPARs. The hypervisor is typically a part of the system firmware and manages the allocation of resources to the operating systems and LPARs. As such, one logically partitioned computer may run one or more LPARs and thus virtualize the operations of the applications, operating systems, and other program code configured to operate in those logical partitions.
In addition to sharing the physical processors and memory in a logically partitioned computer, LPARs also typically share other types of physical hardware resources, which are collectively referred to herein as input/output (IO) resources. For example, in order to provide LPARs with access to external networks, logically partitioned computers typically include multiple physical network adapters, e.g., network interface cards (NICs), that are shared by the LPARs, such that each LPAR is allocated at least a part of one or more physical network adapters to enable that LPAR to access various networks, e.g., local area networks, wide area networks, storage networks, the Internet, etc. Many IO resources, including many network adapters, are compliant with various Peripheral Component Interconnect (PCI) standards. PCI-compliant IO resources typically implement one or more PCI functions, e.g., to support different protocols such as Ethernet, Fibre Channel over Ethernet (FCoE), etc.
Access to IO resources in both logically partitioned and non-partitioned computers is typically handled at the operating system level through the use of device drivers. Device drivers typically provide a common interface to the operating system and the applications executing thereon to effectively hide the implementation details of a particular hardware device from these higher software layers. High level commands from these higher software layers are typically translated to device-specific commands that are appropriate for the particular make and model of the underlying IO resource. Therefore, so long as different device drivers from different vendors of a particular type of IO resource provide the same common interface to the operating system and applications, the operating system and applications can access the IO resource using the same commands and without concern for the particular make and model of the IO resource.
In many conventional logically partitioned computers, IO resources are virtualized within the hypervisor, so that conventional device drivers, appropriate for use in both logically partitioned and non-partitioned computers, may be used. Virtualization of an IO resource in a hypervisor typically requires that the hypervisor trap device accesses by the device drivers in the LPARs and effectively route the operations to the appropriate physical IO resources. Thus, where multiple LPARs share a common physical IO resource, the hypervisor itself handles the multiplexing of operations performed by the physical IO resource on behalf of each LPAR. Allocating such higher-level functionality to a hypervisor, however, has been found to introduce excessive complexity and processing overhead to the hypervisor. It is desirable in many implementations for a hypervisor to be as small, compact, fast and secure as possible so that the processing overhead of the hypervisor is minimized. As such, other technologies have been introduced in an attempt to off-load the responsibility of virtualizing IO resources from the hypervisor.
For example, in some designs, a dedicated LPAR, referred to as a virtual input/output server (VIOS), may be used to manage the virtualization of IO resources. While the use of a VIOS offloads higher-level functions from the hypervisor and reduces the overall complexity of the hypervisor, it has been found that using LPARs to provide such services to other LPARs requires relatively high overhead to instantiate and run the LPAR, and thus, a full operating system, in order to provide such services.
More recently, some designs have relied upon adjunct partitions (APs), which have also been referred to as partition adjuncts, to assist with the virtualization of IO resources. An AP is a type of partition that is more limited than a full, logical partition. An AP typically runs in a flat, static effective address space and problem state, which permits the hypervisor to apply a range of hypervisor and processor optimizations that result in a substantial decrease in system overhead associated with a context switch of the state machine from an LPAR to state data of an AP, that is, compared to a context switch of the state machine between two LPARs. In other respects, an AP is similar to a full LPAR. For example, an AP typically can be assigned resources, either physical or virtual, similar to a full LPAR. Further, an AP can be an end-point of a virtual input output (VIO) communications mechanism, similar to a full LPAR, such as VIOS.
In addition, some designs have incorporated the concept of self-virtualization of IO resources, where at least a portion of the virtualization of a physical IO resource is handled within the resource itself. The PCI single root input/output virtualization (SRIOV) specification, for example, enables a physical IO resource such as a NIC to incorporate replicated on-board functionality such as memory spaces, work queues, interrupts, and command processing so that a single function such as a single Ethernet connection can be presented to a logically partitioned computer as multiple and separate physical functions. The SRIOV specification introduces the concepts of physical functions (PFs) and virtual functions (VFs), with the former representing full PCI functions and having the ability to instantiate, configure and manage VFs, and the latter representing lightweight PCI functions with reduced configuration resources and usable by LPARs to access a self-virtualizing device.
It has been found that the use of APs in conjunction with self-virtualizing IO resources provides a flexible, efficient framework with which to virtualize IO resources in a logically partitioned computer, and does so without requiring a separate full LPAR to provide the virtualization, and without requiring such functionality to be embedded within client LPARs or in the hypervisor.
Some inefficiencies nonetheless exist in logically-partitioned computers that utilize APs to manage self-virtualizing IO resources. For example, in some environments, APs execute within the context of their associated LPARs, with work scheduling of such APs primarily under the direct control of the operating systems installed in the associated LPARs. In addition, in some environments, device drivers for the VFs are resident in the APs, and certain reliability, availability and serviceability (RAS) capabilities and management functions may require these VF device drivers for all of the VFs of a self-virtualizing IO resource to perform work in concert with each other. This type of work is referred to herein as platform work, as it is associated with work that is generally related to the underlying platform and/or to configuration or management of the APs and other components of the underlying platform, as opposed to the primary workloads of the APs, which typically focus on communicating data between the LPARs and the self-virtualizing IO resources.
There is a concern, however, that allocating control of the work scheduling of APs to the operating systems in the associated LPARs presents the risk of a possible denial of service scenario, where due to either an unwillingness or inability of an operating system to allow one AP to handle pending platform work for a VF may result in other APs that are also performing related platform work stalling or hanging while waiting for that AP to complete its related platform work.
A need therefore exists in the art for a manner of ensuring quality of service for platform work in APs of a logically partitioned computer.