1. Field of the Invention
This invention relates to a method and apparatus for controlling the execution of a broadcast instruction on a guest processor of a guest multiprocessing configuration. More particularly, it relates to a method and apparatus for controlling the execution, on a guest processor operating in interpretive execution mode, of an instruction for invalidating table entries of a dynamic address translation (DAT) system or for purging contents of buffers used for address translation. The invention also relates more generally to a method and apparatus for managing a lock.
2. Description of the Related Art
This invention relates to the execution on guest processors of instructions relating to dynamic address translation. A brief discussion of the concepts of dynamic address translation and virtual or guest machines as they relate to the present invention follows.
Dynamic address translation is a well-known mechanism for converting a virtual memory address used by a program to reference data to a real memory address at which the data is actually stored. Dynamic address translation allows a program to have a virtual address space larger than the real address space, since much of the virtual address space can be mapped to virtual storage (i.e., a peripheral device such as a disk) rather than real addressable memory. Pages can be moved from virtual storage to real storage on an as-needed basis. Dynamic address translation is performed by a central processing unit (CPU) (the terms “CPU” and “processor” are used generally interchangeably herein) in a manner that is generally transparent to the program, which does not have to concern itself with the details of the translation. Dynamic address translation as performed in computers conforming to Enterprise Systems Architecture/390 (ESA/390), a 31-bit architecture, is described at pages 3-26 to 3-40 of the IBM publication IBM Enterprise Systems Architecture/390 Principles of Operation, SA22-7201-07 (July 2001), while dynamic address translation as performed in computers conforming to the z/Architecture, a 64-bit architecture, is described at pages 3-26 to 3-47 of the IBM publication IBM z/Architecture Principles of Operation, SA22-7832-01 (October 2001), both of which publications are incorporated herein by reference.
A processor performing dynamic address translation uses various tables to keep track of the mapping between virtual and real memory addresses. Of particular interest here are the page tables, which contain entries corresponding to “pages” (4,096-byte blocks in the referenced architectures) of virtual memory. In the architectures referenced above, each page table entry contains either a pointer to a real memory location at which the page is located or an indicator that the data is “paged out” to virtual storage and that the entry is therefore invalid.
In addition to DAT tables such as the page tables just described, processors use so-called translation lookaside buffers (TLBs) to store the corresponding real storage locations of the most recently accessed virtual pages. Since most consecutive memory references are to the same or a recently used page of virtual memory, the use of TLBs further speeds up the translation process by avoiding the overhead of retranslation.
When a page of virtual memory is paged out from a real memory to virtual storage, the corresponding page table and TLB entries must be invalidated so that incorrect memory accesses will not occur. In both the ESA/390 architecture and the z/Architecture referenced above, an Invalidate Page Table Entry (IPTE) machine instruction is used to invalidate such page table and TLB entries. More particularly, as described in the ESA/390 publication at pages 10-26 to 10-27 and in the z/Architecture publication at pages 10-29 to 10-30, execution of an IPTE instruction entails invalidating the designated page table entry and clearing or purging the translation lookaside buffers (TLBs) of all CPUs in the configuration of the associated entries. Although most of the discussion herein will center on the IPTE instruction, additional instructions that require the purging of buffer entries in other CPUs of the configuration include the Compare and Swap and Purge (CSP) instruction of both referenced architectures. Collectively, these will be referred to herein as broadcast instructions, since they are typically broadcast to other CPUs.
We turn now to the other background principle underlying this invention. As described in the IBM publication IBM System/370 Extended Architecture Interpretive Execution, SA22-7095-1 (September 1985), incorporated herein by reference, a host machine and host program executing on the host machine may be operated in such a manner as to create one or more virtual or guest machines on which guest programs execute. From the standpoint of a guest program executing on a guest machine, the virtual machine appears to be a real machine. Guest machines of this type, supported directly by the host machine and host program, are referred to as level 1 guests, with the host being level 0. In a similar manner, each such level 1 guest machine, in conjunction with a suitable level 1 guest program running on it, may support one or more level 2 guest machines with corresponding level 2 guest programs. Host programs capable of creating virtual machines of this type include the IBM VM/ESA and z/VM operating systems, as well as the Processor Resource/Systems Manager (PR/SM) feature of the IBM eServer S/390 G5 and G6 and zSeries 800 (z800) and zSeries 900 (z900) servers. While VM/ESA and z/VM are packaged as separate programs whereas PR/SM is packaged as a machine feature, they function similarly insofar as the present invention is concerned. In the case of PR/SM, the virtual machines are generally referred to as logical partitions (LPs), while the host program is referred to as the logical partition manager.
A host machine may be either a uniprocessor machine containing a single real CPU or a multiprocessor (MP) machine containing multiple real CPUs. To support a virtual machine of the type described above, each host processor of a host machine is operable in a so-called interpretive execution mode in which it executes instructions of a guest program running on a guest machine. Each processor operating in such a fashion constitutes a virtual or guest processor of the guest machine. Each guest machine, like the host machine, may be either a uniprocessing (UP) configuration containing only a single guest CPU or a multiprocessing (MP) configuration containing plural guest CPUs.
Among the instructions of the guest program executed by a guest processor are broadcast instructions of the type described above. This is described, for example, in U.S. Pat. No. 4,779,188 (Gum et al.), entitled “Selective Guest System Purge Control”; U.S. Pat. No. 4,456,954 (Bullions et al.), entitled “Virtual Machine System with Guest Architecture Emulation Using Hardware TLB's for Plural Level Address Translations”; and U.S. Pat. No. 5,317,705 (Gannon et al.), entitled “Apparatus and Method for TLB Purge Reduction in a Multi-Level Machine System”, all of which are incorporated herein by reference. Certain performance problems are encountered, however, when broadcast instructions are executed by a guest processor of a guest MP configuration. In systems of the type described above, what is known as an IPTE lock is conventionally used to serialize the actions of guest processors of a particular guest machine when executing broadcast instructions. The nature of the IPTE lock is such that it can be held either by a single guest processor of the guest machine on an exclusive basis or by one or more host processors (on a basis defined by the host program). The IPTE lock is implemented by a lock bit that is set to one to indicate that it has been acquired, either by a guest processor or by one or more host requesters, and a count of the number of host requesters holding the lock on a shared basis. The problem arises when multiple guest processors of a particular guest machine encounter IPTE or other broadcast instructions in close time sequence, as they often do. The first guest processor to encounter such an instruction may acquire the lock, but the others will fail in their lock attempts, resulting in what is known as an instruction interception taking them out of interpretive execution mode. As a consequence, the broadcast instructions that would have been executed by the guest processors must now be executed by the host processors operating in instruction simulation mode. This is very expensive in terms of use of computer resources.