1. Field of the Invention
This invention relates to computer programs in general, and in particular, to a method and related apparatus for translating virtual addresses to physical addresses in a virtual machine monitor, or other software-based instruction processor.
2. Description of the Related Art
Designers and manufacturers of computer systems, central processing units (CPUs) and other hardware and software components of computer systems are continually developing new techniques for better utilizing the resources of a computer system to obtain better overall processing performance. Many of these techniques are well known in the art, such as multiprocessing operating systems, cache memory, virtual memory, direct memory access, multiprocessor systems and hyperthreaded CPUs. There are also many different variations of each of these techniques. Several of these techniques are relevant to this invention, including virtual memory systems, multiprocessor systems and hyperthreaded CPUs.
Virtual Memory Systems
The design and use of virtual memory systems are well known in the art, and there are numerous books and other technical references available on the subject. This invention may be implemented in various different computer systems, using various different virtual memory techniques. For purposes of an example only, the invention will be described in relation to a virtual memory system based on the x86 architecture from Intel Corporation. This architecture is described in the IA-32 Intel Architecture Developer's Manual, a three-volume set, which is currently available on the Internet website of Intel Corporation, and which is hereby incorporated by reference. Volume 3 of that set, the Software Developer's Manual, is particularly informative regarding the virtual memory functions of the architecture.
FIG. 1 is a block diagram of the major functional components of a general virtual memory system in a computer. The system comprises a CPU 10, a memory management unit (MMU) 12, a translation lookaside buffer (TLB) 14, a random access memory (RAM) 16, a plurality of page tables 18, an operating system (OS) 22, a software memory manager (SMM) 24, a hard disk drive 20 and a direct memory access (DMA) controller 21. Each of the functional units illustrated in FIG. 1 may be implemented by conventional components of the well known personal computer (PC) standard architecture. The CPU 10 may also be called a processor. The RAM 16 may also be called a primary memory, while the hard drive 20 may be called a secondary memory. Also, the MMU 12 and the SMM 24 may be considered parts of a more general memory management unit, in which the SMM 24 is the software that controls the hardware MMU 12. The CPU 10 and the MMU 12 may be combined within a single integrated circuit (IC) component, or they may be separate components. Also, the TLB 14 may be contained within the same IC component as the MMU 12, or it may be a separate device.
The most basic function of the CPU 10 is to execute computer programs, including the OS 22. The computer programs are generally stored on the hard drive 20 and loaded into the RAM 16 for execution. The CPU 10 issues memory read commands to retrieve instructions of the computer programs from the RAM 16 and then executes the retrieved instructions. The execution of instructions requires a myriad of other functions too, including reading data from and writing data to the RAM 16. For example, an instruction executed by the CPU 10 may require an operation to be performed on an operand, which may be located in the RAM 16, or the instruction may require that a value be written to a stack, which may also be located within the RAM 16. All information stored in the RAM 16 may be called data, whether the data consists of instructions, operands, stack data or other types of data. At times, however, a distinction may be drawn between different types of data. In addition, the term “computer program” will generally include instructions, operands and the associated stack.
A computer program is loaded from the hard drive 20 into the RAM 16 for execution because fetching information from the RAM 16 is much quicker than from the hard drive 20, which enables the CPU 10 to execute the program much more quickly. Earlier computer systems would load an entire computer program into the RAM 16 for execution, including providing additional RAM required by the program during execution, such as for a data stack. However, RAM is relatively expensive in comparison to the cost of other data storage devices, such as disk drives. As a result, computer systems are often designed with a limited amount of RAM, in comparison to the address space of the system, especially in systems that use 64-bit addressing. This gives rise to various situations in which a computer program requires more memory space than is available in the RAM 16. A simple example of such a situation is when a computer program is simply larger than the RAM 16 of the system on which the program is to run. Another example is in a multiprocessing system, when the sum of the memory required by all of the executing processes and the OS 22 exceeds the amount of RAM 16 in the computer system. Virtual memory techniques may be used to enable the execution of a computer program in such a situation where the RAM 16 that is available for use is less than the total amount of memory required by a computer program.
Virtual memory techniques may be implemented, for example, using a combination of hardware and software. The software portion of such an implementation may be provided by the SMM 24 of the OS 22 of FIG. 1, while much of the hardware functionality may be provided by the MMU 12. The MMU 12 may be included, along with the CPU 10, within a single microprocessor device, such as an Intel Pentium microprocessor, or the MMU 12 may be a separate device. Virtual memory techniques give the appearance, to a computer program, that there is more RAM available than is really the case. The computer program is provided with a virtual address space, which contains all of its instructions, data and stack. The virtual address space is generally larger than the available RAM 16, but the computer program may use the entire virtual address space as if it were all contained in the RAM 16. The virtual address space may have various different types of organization, such as linear or segmented. At any given time, one or more parts of the computer program will be in the RAM 16 while one or more other parts of the computer program will not be in the RAM 16, but will be stored on the hard drive 20. If the computer program attempts to use a part of its address space that is currently not contained in the RAM 16, the SMM 24 will typically transfer the required part of the computer program from the hard drive 20 to the RAM 16.
To implement a virtual memory system, a computer program may be divided into a number of units called pages. For this discussion, assume a 4 kilobyte (Kbyte) page, which is one possible page size in the x86 architecture. Some of the pages of the computer program are loaded into the RAM 16, while others are not, depending on the amount of the RAM 16 that is available to the computer program. Also, the pages that are loaded into the RAM 16 may not be loaded contiguously. Typically, a particular page of the computer program on the hard drive 20 could be loaded into any available page within the RAM 16.
During execution of a computer program, the CPU 10 generates addresses within the virtual address space of the computer program, for reading data from and writing data to the RAM 16. The addresses generated by the CPU 10 may be called virtual addresses or linear addresses. However, the virtual addresses cannot be directly applied to the RAM 16 in a virtual memory system to access the desired memory locations. Instead, the virtual addresses must first be translated into corresponding physical addresses within a physical address space. The physical address space comprises the addresses that are used to access specific memory locations within the RAM 16. The MMU 12 and the SMM 24 have primary responsibility for translating or mapping addresses from the virtual address space to the physical address space. When the CPU 10 attempts to access data from the computer program that resides on a page of the program that is not currently loaded into the RAM 16, the MMU 12 determines that the page is not resident in the RAM 16, a page fault occurs and a trap to the OS 22 ensues. The SMM 24 subsequently transfers the required page from the hard drive 20 into the RAM 16. After the page transfer is complete, execution of the computer program resumes at the same instruction that resulted in the page fault. This time, however, the MMU 12 will determine that the page is loaded into the RAM 16 and the memory access will be completed successfully. If there is not enough available space in the RAM 16 for loading the required page during a page fault, the SMM 24 typically ejects another page from the RAM 16, and the space that the ejected page was occupying is freed up for loading the new page. If the page that is being ejected has been modified in the RAM 16 since it was loaded from the hard drive 20, then it is written back to the hard drive 20 before its memory space is used for the new page.
As described in greater detail below, the MMU 12 initially uses the page tables 18, located within the RAM 16, to translate virtual addresses into physical addresses. In this process, when the MMU 12 receives a virtual address from the CPU 10 for a memory read or write, the MMU 12 must first perform at least one memory read within the page tables 18 just to determine the corresponding physical address. The MMU 12 must then perform another memory access to complete the read or write required by the CPU 10. If the MMU 12 had to access the page tables 18 for every memory access from the CPU 10, using the virtual memory system would add at least one extra memory cycle to each memory access. In some virtual memory systems, multiple memory accesses are required to map a virtual address to a physical address, using the page tables 18. The added memory cycles would slow down the execution of instructions, which would reduce the overall processing power of the computer system. The primary purpose of the TLB 14 is to reduce the number of additional memory accesses that are required to implement the virtual memory system. The TLB 14 is basically a cache for page table entries and typically is located within the MMU 12. Fortunately, when a CPU 10 is executing a computer program, most of its memory accesses will be to a limited number of pages within the RAM 16. At any given time, for a particular program, the CPU 10 will typically access one or a few pages of code, one or a few pages of data and one or a few pages for the stack, depending on the page size used.
At this point, it is useful to discuss page numbers. As described above, the virtual address space of a computer program or a process is divided into a number of pages. As used herein, a process is generally an instance of a computer program. Each of these pages can be numbered consecutively, resulting in virtual page numbers. In the same way, the physical address space of the RAM 16 can be divided into pages as well. These pages can also be numbered consecutively, resulting in physical page numbers. Now, a virtual address can be viewed as specifying a virtual page number in the upper bits and an offset within that page in the lower bits. In the same way, a physical address can be viewed as a physical page number combined with an offset into that physical page. For example, in a system having 32-bit addresses and a 4 Kbyte page size, such as an x86 system, the upper 20 bits of an address can be viewed as a page number and the lower 12 bits can be viewed as an offset within a given page. Then, so long as both virtual pages and physical pages begin at an address that is a multiple of the 4 Kbyte page size, the address translation process can be viewed as converting the upper address bits from a virtual page number to a physical page number, with the lower address bits remaining unchanged as the offset into the respective pages.
The MMU 12 uses the page tables 18 to perform this translation from virtual page numbers to physical page numbers. When the MMU 12 receives a virtual address from the CPU 10, the MMU 12 reads the virtual page number from the upper address bits of the address. The MMU 12 then reads information from the page tables 18 relating to the desired virtual page number. First, the page tables 18 will indicate whether the virtual page number is currently loaded into the RAM 16. If the virtual page is not loaded into the RAM 16, a page fault is generated and the required virtual page is loaded into the RAM 16 as described above. If the virtual page is loaded into the RAM 16, the page tables 18 will also indicate the physical page number that corresponds to the virtual page number. The MMU 12 then uses the retrieved physical page number, along with the offset from the virtual address to access the desired location within the RAM 16. In addition, the MMU 12 writes the virtual page number and the physical page number into an entry in the TLB 14, indicating the mapping between the pages. Accessing the page tables 18 in this manner to determine a mapping from a virtual page number to a physical page number is called walking the page tables 18. Now that the mapping from the virtual page number to the physical page number has been written into the TLB 14, if a subsequent memory access is to the same virtual page number, the MMU 12 can find the appropriate mapping in the TLB 14 within the MMU 12, without having to access the page tables 18 in the RAM 16.
The MMU 12 is designed such that the access to the TLB 14 is much quicker than an access to the page tables 18. The TLB 14 can typically only hold a relatively small number of page mappings, such as 8 to 64 entries, in comparison to the size of the page tables 18. As a result, entries must be evicted from the TLB 14 from time to time. Typically, when the MMU 12 walks the page tables 18 to determine a new mapping, the MMU 12 will evict an existing entry in the TLB 14 to make space to enter the new mapping. Thus, when the MMU 12 receives a virtual address from the CPU 10, the MMU 12 may first access the TLB 14 to determine if the desired mapping is there. If the mapping is not in the TLB 14, then the MMU 12 must perform a page table walk, as described above and in greater detail below.
FIG. 2A shows a 32-bit virtual address 30, comprising a 10-bit page directory entry (PDE) 30A, a 10-bit page table entry (PTE) 30B and a 12-bit offset 30C. FIG. 2B illustrates the structure and operation of the page tables of the x86 architecture, as a more detailed example. FIG. 2B also shows a page directory 40 with 1024 page table base address (PTBA) entries 42, including one specific PTBA entry 42X. FIG. 2B also shows a plurality of page tables 50A, 50X and 50N. These page tables, along with other page tables that are not illustrated, will be collectively referred to as page tables 50. This convention, of using a common numeric portion to refer collectively to all items having alphanumeric references containing the same numeric portion, is used throughout this description. As shown relative to the page table 50X, each of the page tables 50 comprises 1024 physical page base address (PPBA) entries 52. Page table 50X includes one specific PPBA entry 52X. FIG. 2B also shows a plurality of physical pages 60, including the physical pages 60A, 60X and 60N. As shown relative to the physical page 60X, each of the physical pages 60 comprises 4096 addressable bytes 62. Physical page 60X includes one specific byte 62X. Each of the physical pages 60, the page tables 50 and the single page directory 40 reside in the RAM 16. Each of the physical pages 60 includes 4096 bytes, or 4 Kbytes. As described above, the physical pages 60 and the virtual pages of the example in this description include 4 Kbytes of data. Each of the 1024 PTBA entries 42 in the page directory 40 comprises 32 bits, or 4 bytes. Thus, the page directory 40 also constitutes a full 4 Kbyte page in the RAM 16. Each of the 1024 PPBA entries 52 in the page tables 50 also comprises 32 bits. So, each of the page tables 50 also constitutes a full 4 Kbyte page in the RAM 16.
When the MMU 12 receives a virtual address 30 from the CPU 10, the MMU 12 may first check to see if there is an entry in the TLB 14 that provides a mapping from the virtual page number to a corresponding physical page number. The combination of the PDE 30A and the PTE 30B is considered the virtual page number 30AB. In this architecture, the TLB 14 maps 20-bit virtual page numbers to 20-bit physical page numbers. So, the MMU 12 checks whether there is a valid entry in the TLB 14 matching the virtual page number 30AB. If there is, the MMU 12 uses this entry to obtain the desired mapping to a physical page 60. If there is no matching entry in the TLB 14, the MMU 12 must walk the page tables 18. In the x86 architecture, the page directory 40 may be considered a page table 18, as well as the page tables 50. To walk the page tables 18, the MMU 12 first reads a 20-bit value from a control register CR3. This 20-bit value is used as the upper 20 bits of a 32-bit address that points to the base of the page directory 40. The lower 12 bits of this address are set to zero. Thus, the page directory 40 must begin at an address that is a multiple of the 4 Kbyte page size. The page tables 50 and the physical pages 60 must also begin at an address that is a multiple of the 4 Kbyte page size for the same reason. Once the base address of the page directory 40 is determined, the PDE 30A is used as an index into the 1024-entry table of the page directory 40. More specifically, the 20 bits from the control register CR3 are used as the upper address bits, the 10 bits from the PDE 30A are used as the next lower address bits, and the last two address bits are set to 0 to form a memory address, which addresses the PTBA entry 42X. As illustrated in FIG. 2B, the control register CR3 points to the beginning of the page directory 40, while the PDE 30A points to the PTBA entry 42X. One bit of the PTBA entry 42X indicates whether the PTBA entry 42X is a valid entry. If it is not a valid entry, a page fault results, which generally indicates an error condition in the SMM 24. If the entry is valid, a 20-bit value from the PTBA entry 42X is used as the upper bits of a base address for the page table 50X. The PTE 30B is used as an index into the 1024-entry table of the page table 50X. As shown in FIG. 2B, the page table base address entry 42X points to the base of the page table 50X, while the PTE 30B points to the PPBA entry 52X. One bit of the PPBA entry 52X indicates whether the virtual page number 30AB is currently loaded into the RAM 16. If the virtual page number 30AB is not currently loaded into the RAM 16, a page fault results and the required virtual page is loaded into the RAM 16, as described above. If the virtual page number 30AB is loaded into the RAM 16, a 20-bit value from the PPBA entry 52X is used as the upper address bits of a base address for the page table 60X for the current memory access. The offset 30C is now used as an index into the physical page 60X to identify a specific byte address 62X for the memory access. In other words, the 20 bits from the PPBA entry 52X are combined with the 12 bits from the offset 30C to form a 32-bit physical address that is used to perform the memory access requested by the CPU 10. As shown in FIG. 2B, the PPBA entry 52X points to the base of the physical page 60X, while the offset 30C points to the required byte address 62X for the memory access.
Generally, the SMM 24 of the OS 22 is responsible for creating and maintaining the page tables 18 for the use of the MMU 12. The MMU 12 is generally responsible for loading values into the TLB 14 for recently obtained mappings between virtual page numbers and physical page numbers. Values may be flushed from the TLB 14 either by the MMU 12 or by the SMM 24, or possibly by other software within the RAM 16, such as user-level application programs. Each entry within a page table 18 generally contains, in addition to a physical page number, a few other bits for indicating whether the entry is valid, what types of access are allowed for the page, whether the page has been modified and/or referenced since it was loaded into the RAM 16 and whether caching is disabled for the page. An entry within the TLB 14 generally contains a virtual page number and a physical page number, as well as a few additional bits to indicate whether the entry is valid, whether the page has been modified since being loaded into the RAM 16 and what types of access are allowed for the page. When a memory access is performed, if the MMU 12 determines that the virtual page is loaded into the RAM 16, the MMU 12 also accesses these additional bits of the entry within either the page tables 18 or the TLB 14, to determine if the requested memory access is permitted. For example, the access bits may indicate that only read accesses are permitted. If the CPU 10 attempts to write data to such a location, the MMU 12 will generate a page fault.
When a mapping for a particular virtual page number is not contained within the TLB 14 and a page table walk is performed, the MMU 12 typically evicts an entry from the TLB 14 to free up space for a new entry for the current mapping. The virtual page number will be written into the newly available entry in the TLB 14, along with the physical page number that was just determined. The additional bits within the entry of the TLB 14 are typically copied from the corresponding additional bits in the corresponding page table entry. When an entry in the TLB 14 is evicted, a bit indicating whether the page has been modified is typically copied from the entry of the TLB 14 to the corresponding entry in the page table 18. Also, if the SMM 24 removes a virtual page from the RAM 16 for which there is an entry in the TLB 14, the SMM 24 must modify the entry in the TLB 14 to indicate that the mapping is no longer valid. Other programs may also be allowed to indicate that an entry of the TLB 14 is invalid, including possibly user-level applications. The x86 architecture provides an instruction, Invlpg(virtual address), for this purpose. The x86 architecture is defined such that, if an entry in the TLB 14 is set as invalid, the MMU 12 will walk the page tables to determine a mapping for the virtual address. However, if an entry in the TLB 14 is not set as invalid, the MMU 12 may use the TLB 14 to obtain a mapping, or the MMU 12 may walk the page tables to determine the mapping. The x86 architecture also provides an instruction for flushing the entire contents of the TLB 14. As described above, entries within the TLB 14 may also be evicted by the MMU 12 to free up space for a new mapping for a new virtual address. Thus, an entry in the TLB 14 may be created for a specific virtual page number in response to a first access to that virtual page. During a subsequent access to the same virtual page, if the entry in the TLB 14 has been evicted by the MMU 12 in between the two accesses, a page table walk will nonetheless be required. This situation is described as a leakage of the TLB 14.
Multiprocessor Systems and Hyperthreaded CPUs
Another technique that can lead to better performance from a computer system, and that is relevant to this invention, involves combining multiple CPUs within a single computer system to form a multiprocessor system. Multiprocessor systems are also well known in the art and there are various architectures currently available. FIG. 3 illustrates one general architecture for a multiprocessor system. FIG. 3 shows a shared primary memory 16B, an OS 22B, an SMM 24B, a plurality of page tables 18B, and a shared secondary memory 20B. These functional units perform the same basic functions as the corresponding functional units shown in FIG. 1, but they may need to be modified to perform these functions in a multiprocessor environment. There are various types of operating systems 22B for use in multiprocessor systems. Some multiprocessor operating systems 22B use a single operating system image to manage the entire set of processors in concert. In other multiprocessor systems, the system hardware provides a physical partitioning of the system, allowing a different instance of a multiprocessor operating system 22B to manage each partition. In the case of a multiprocessor OS 22B comprising a separate instance for each CPU, the separate instances of the OS 22B may be executed in separate private memories associated with each of the multiple CPUs. The shared primary memory 16B may be the same as the RAM 16, except perhaps larger, and the shared secondary memory 20B may be the same as the hard drive 20, except perhaps larger. The page tables 18B may be the same as the page tables 18, except that there may be more sets of page tables 18B because of the multiple CPUs.
FIG. 3 also shows a first processor (such as a microprocessor) 9A, having a first CPU 11A, a first MMU 13A and a first TLB 15A. The microprocessor 9A is also connected to a first private memory 17A. FIG. 3 also shows a second processor (such as a microprocessor) 9B, having a second CPU 11B, a second MMU 13B and a second TLB 15B. The microprocessor 9B is also connected to a second private memory 17B. The multiprocessor system of FIG. 3 may also have additional microprocessors 9 and associated private memories 17. The microprocessors 9 may be, for example, based on the x86 architecture. In addition, each of the private memories 17 is optional.
In a single-processor system, there may be a single set of page tables 18 or there may be multiple sets of page tables 18. Each process could have its own set of page tables 18 or there could be some sharing of page tables 18. In a multiprocessor system, there could be page tables 18B in the shared primary memory 16B, in one or more of the private memories 17, or both, and any of these page tables 18B could be shared between multiple processes or exclusive to a single process. As another alternative to the system illustrated in FIG. 3, one or more TLBs 15 could be shared among multiple CPUs 11. One example of a multiprocessor system having a shared TLB 15 is illustrated in FIG. 4 and described below.
The virtual memory system implemented in the system of FIG. 3 can be functionally similar to the virtual memory system described above in connection with FIGS. 1 and 2. More specifically, the TLBs 15 and the page tables 18B can have the same basic structure and functionality as the TLB 14 and the page tables 18, respectively, and the MMUs 13 and the SMM 24B can control and use the TLBs 15 and the page tables 18B in the same general manner that the MMU 12 and the SMM 24 control and use the TLB 14 and the page tables 18. If there is no sharing of the TLBs 15 or the page tables 18B between multiple CPUs 11, then the virtual memory system of FIG. 3 can be functionally the same as the virtual memory system of FIGS. 1 and 2, but with a separate instance of the virtual memory system for each of the CPUs 11. However, if there is any sharing of the TLBs 15 or the page tables 18B between the multiple CPUs 11, the virtual memory system gets more complicated. The following discussion will focus on a multiprocessor system containing only two CPUs 11, for simplicity, although it also applies to systems with more CPUs 11.
The discussion also applies to systems that have only one physical CPU, if the CPU implements hyperthreading techniques. Hyperthreading techniques are known in the art and are becoming more prevalent, especially in high-performance CPUs, such as the Xeon microprocessor from Intel Corporation. In a CPU that implements hyperthreading, multiple instruction streams are executed simultaneously. With multiprogramming or multithreading techniques, in contrast, different instruction streams are executed during separate time slices. A CPU that does not provide hyperthreading can generally be modeled as an interpreter loop, in which the CPU repeatedly fetches an instruction, fetches any required operands, performs an operation and does something with the result of the operation, such as writing the result to memory, before moving on to fetch the next instruction. A hyperthreaded CPU, in contrast, can be modeled as multiple independent interpreter loops running concurrently. Effectively, the single physical CPU core provides the capabilities of multiple logical CPUs. However, a hyperthreaded processor typically only has a single TLB, although multiple TLBs are also possible. For the purposes of this invention and the discussion below, a hyperthreaded processor, having multiple logical CPUs but only one TLB, is functionally equivalent to a multiprocessor system having multiple physical CPUs and a single, shared TLB. This invention and the following discussion may apply to any computer system in which multiple processes are executing simultaneously on multiple physical or logical CPUs, and the multiple processes share a common TLB or page table. In fact, as will become apparent below, this invention may even apply in a system having a separate TLB and a separate set of page tables for each process, if one process has write access to the page tables of another process, even if such access is provided inadvertently due to a system software error, for example.
FIG. 4 illustrates another example architecture for a multiprocessor computer system that is relevant to the invention and the following discussion. Specifically, FIG. 4 shows a first CPU 11C, a second CPU 11D, an MMU 13C, a shared TLB 15C, a shared primary memory 16C, an OS 22C, an SMM 24C, a set of page tables 18C, and a shared secondary memory 20C. Each of the CPUs 11C and 11D may be either physical or logical, and there may also be additional physical and/or logical CPUs 11. The MMU 13C and the TLB 15C are shared between the CPUs 11C and 11D. Otherwise, the functional units illustrated in FIG. 4 may be equivalent to the corresponding functional units illustrated in FIG. 3.
Referring again to the multiprocessor system of FIG. 3, suppose that the CPU 11A is executing a first process and the CPU 11B is executing a second process. The system of FIG. 3 implements a virtual memory system, with some pages of the virtual address space of the first process loaded into primary memory 16B and others remaining in the secondary memory 20B. Suppose, for the moment, that the first and second processes share a common set of page tables 18B. The page tables 18B indicate, for each virtual page, whether it is loaded into the primary memory 16B or whether it remains in the secondary memory 20B. The page tables 18B also indicate, for each virtual page loaded into the primary memory 16B, the corresponding physical page number into which the virtual page is loaded. The TLB 15A may also contain one or more entries indicating mappings between virtual pages and physical pages of the primary memory 16B.
Suppose further that the first process executes a first instruction that accesses a first memory location on a first virtual page that is currently loaded into a first physical page. Suppose that the MMU 13A walks the page tables 18B to determine a mapping between the first virtual page and the first physical page, and stores this mapping in the TLB 15A. Now suppose that the second process changes the page tables 18B, or performs some action that causes the page tables 18B to be changed. For example, the second process may attempt to access a second virtual page that is not currently loaded into the primary memory 16B, causing a page fault. In response to the page fault, the SMM 24B loads the second virtual page from the secondary memory 20B into the primary memory 16B. Suppose that the SMM 24B loads the second virtual page from the secondary memory 20B into the first physical page of the primary memory 16B, replacing the first virtual page. The SMM 24B updates the page tables 18B to indicate that the second virtual page is now loaded into the primary memory 16B and is mapped to the first physical page, and to indicate that the first virtual page is no longer loaded into the primary memory 16B.
Now suppose that the first process executes a second instruction that again accesses the first memory location on the first virtual page, or some other memory location on the first virtual page. If the MMU 13A accesses the TLB 15A to determine a mapping for the first virtual page, the previously stored mapping will indicate that the first virtual page is mapped to the first physical page. The MMU 13A would then retrieve the contents of the corresponding memory location within the first physical page and provide this data to the CPU 11A for executing the second instruction. However, the data retrieved by the MMU 13A is actually from the second virtual page, instead of from the first virtual page as intended by the first process. Thus, the CPU 11A would execute the second instruction based on incorrect data, possibly corrupting the data for the first process, the second process, or both. If the first virtual page contained code for the first process, as opposed to operand data or stack data, so that the attempted memory access were an instruction fetch, then the CPU 11A would attempt to execute whatever data is retrieved by the MMU 13A. If the second virtual page happens to contain operand or stack data, then the CPU 11A would nonetheless attempt to interpret the returned data as an instruction and try to execute the interpreted instruction. This situation would also likely lead to corrupted data, or worse.
Multiprocessor systems generally provide methods to try to avoid situations like these. One common technique would enable the second process to cause the mapping between the first virtual page and the first physical page in the TLB 15A to be flushed, such as by a message between the CPU 11B and the CPU 11A. In this case, when the second instruction is executed by the CPU 11A, causing the second access to the first memory location, the MMU 13A would not find a mapping for the first virtual page in the TLB 15A and would be forced to walk the page tables 18B. The MMU 13A would then determine that the first virtual page is no longer loaded into the primary memory 16B, as appropriate.
Various other conflicts in the virtual memory system of FIG. 3 could also arise. For example, suppose that the page tables 18B for the second process are separate from the page tables 18B for the first process. However, suppose further that the second process begins to write data to a physical page of the shared primary memory 16B containing the page tables 18B of the first process, as if the physical page contained operand data of the second process. Generally, such a situation should not arise. However, various conditions in either the hardware or the software of the multiprocessor system could cause just such a situation to arise, such as a defective cell in a memory component or an error in a computer program. This situation, where the second process is writing operand data into the page tables 18B of the first process, could cause various problems for the first process. The first process could read its page tables 18B and conclude that a virtual page has not been loaded into the primary memory 16B when it has, that a virtual page has been loaded into the primary memory 16B when it hasn't, or that a virtual page maps to an incorrect physical page. Again, multiprocessor systems generally provide safeguards to try to avoid situations like these.
Referring now to the multiprocessor system of FIG. 4, suppose that the CPU 11C is executing a first process and the CPU 11D is executing a second process. The first and second processes may share common page tables 18C, or they may have separate page tables 18C. Regardless, the first and second processes share the same TLB 15C. Suppose again that the first process executes a first instruction that accesses a first memory location on a first virtual page that is currently loaded into a first physical page. Suppose that the MMU 13C walks the page tables 18C to determine a mapping between the first virtual page and the first physical page, and stores this mapping in the TLB 15C.
Now the second process may be able to flush this mapping from the TLB 15C at any time. If the mapping were flushed from the TLB 15C and the first process needed to subsequently access the first virtual page again, the first process would have to walk the page tables 18C again to determine the required mapping. So long as the page tables 18C have not been changed, however, the same mapping will be determined and no harm will be done, except that the mapping would take longer to establish.
Suppose, however, that the second process does not flush the mapping from the TLB 15C. Suppose, instead, that the second process changes the page tables 18C of the first process, or performs some action that causes the page tables 18C of the first process to be changed, whether the page tables 18C are shared or not. This situation is similar to the situation described above in connection with the system of FIG. 3, and the same types of virtual memory conflicts could arise. Similar safeguards are also typically implemented to try to avoid such conflicts.
The technical literature currently available describes these types of potential conflicts in virtual memory systems, as well as numerous others. Such conflicts can arise in multiprocessor systems, or in single-processor systems using a hyperthreaded CPU, involving shared TLBs, shared page tables, or even just shared primary memory. Multiprocessor systems are generally designed using various safeguards to try to avoid such conflicts.
Virtual Machine Monitors
A virtual machine monitor (VMM) is a piece of software that runs directly on top of the hardware of a computer system having a first hardware platform and creates an abstracted or virtualized computer system having a second hardware platform. The second hardware platform, or virtualized platform, may be the same as, similar to, or substantially different from, the first hardware platform. The VMM exports all of the features of the virtualized platform, to create a virtual machine (VM) that is functionally equivalent to an actual hardware system implementing the second hardware platform. The VMM generally performs all of the functions that would be performed by a physical implementation of the virtualized hardware platform, to achieve the same results. For example, a VMM generally implements a virtual memory system that is functionally equivalent to the virtual memory system that would result from a physical implementation of the virtualized platform. Various designs for such VMMs are well known in the art.
An OS designed to run on a computer system having the virtualized hardware platform can be loaded on top of the VMM, and the OS should not be able to determine that it is not running directly on an actual hardware system implementing the virtualized hardware platform. Therefore, in the case where the virtualized hardware platform is the same as the physical hardware platform, the OS can be loaded directly onto the actual computer system or on top of the VMM, and the OS would not be able to determine whether the machine on which it is running is the physical machine or the virtual machine. Drivers and other system software that are designed for computer systems having the virtualized hardware platform can also be loaded onto the VMM. An OS running on a VMM, along with drivers and other system software, is called a guest OS. In addition, application programs that are designed to operate on the guest OS may also be loaded onto the VMM. An application program loaded onto the VMM is called a guest application. As one example of a VMM implementation, a VMM may run on an x86 computer system, and it may virtualize an x86 system. In this case, the VMM creates a VM that is compatible with the x86 architecture. Any operating system that can run on an x86 system may be loaded on top of the VMM. For example, a Windows OS from Microsoft Corporation, such as the Windows 2000 OS, may be loaded as the guest OS on top of the VMM. Application programs that are designed to operate on a system running the Windows 2000 OS can then also be loaded onto the VMM. The guest OS and the application programs will execute just as if they were loaded directly onto the underlying physical x86 system.
VMMs cause the instructions that constitute the guest OS and the guest applications to be executed just as they would be on an actual hardware implementation of the virtualized hardware platform. In some situations, such instructions, which are called guest instructions, may be executed directly on the underlying hardware. This type of execution is called direct execution. In other situations, however, direct execution is not possible or desirable, and the guest instructions must be at least partially processed by software. One type of software-based processing of instructions is called emulation or interpretation. Interpretation involves executing instructions, one by one, in software. A guest instruction is fetched and decoded, any required operands are fetched and the software performs whatever actions are necessary to achieve the same outcome as the instruction would have achieved had it been executed in hardware. After one instruction is executed, the next instruction is fetched for execution, and so on.
Another type of software-based processing of instructions is called binary translation. With binary translation, one or more guest instructions are converted into target instructions, which can be executed by the hardware. The target instructions are usually analyzed and optimized, and then stored for future execution. When the corresponding guest instructions come up for execution, the execution of the hardware processor branches to the target instructions, and execution proceeds from that point. Many techniques are known to perform binary translation, including optimizing the executable code.
Once the guest instructions have been translated into target instructions, the target instructions can be executed repeatedly, without having to re-translate the same guest instructions each time. Although the combined steps of translating guest instructions and executing the target instructions generally takes longer than interpreting the guest instructions, for a single pass, simply executing the target instructions on subsequent passes is substantially faster than interpreting the guest instructions. In effect, the method of binary translation allows the cost of decoding guest instructions to be amortized over multiple execution passes, which can lead to significant overall performance gains. Thus, the choice between interpreting guest instructions or translating the guest instructions generally involves a tradeoff between the time required for an initial execution and the time required for multiple executions of the guest instructions. Interpretation and translation may also be combined in a single system for the software-based processing of instructions.
VMMs have also been designed to operate on multiprocessor systems, and to virtualize multiprocessor systems. For example, one or more VMMs may execute on the hardware platform illustrated in FIG. 3, and may create a VM having the same hardware architecture. As described above, the VMMs should generally be functionally equivalent to the virtualized hardware platform. Of particular relevance to this invention, the VMMs should generally provide a virtual memory system that is functionally equivalent to the virtual memory system of the virtualized platform. Thus, the VMMs should virtualize one or more MMUs and one or more TLBs that are functionally equivalent to the MMUs and TLBs of an actual physical implementation of the virtualized hardware platform. The virtualized MMUs and TLBs should interact with the SMM of the guest OS and with the guest applications in the same manner as the corresponding physical MMUs and TLBs would interact with the SMM and the guest applications. In particular, the VMMs should provide the same safeguards against virtual memory conflicts, which could result from shared TLBs, shared page tables and/or a shared primary memory, as are provided by a physical implementation of the virtualized platform. For example, suppose a VMM, or a set of VMMs, were designed to virtualize the hardware platform illustrated in FIG. 3, including the function described above in which the second CPU 11B may communicate with the first CPU 11A to cause a mapping in the first TLB 15A to be flushed. Such a VMM would virtualize the first microprocessor 9A, the second microprocessor 9B and the first TLB 15A such that the second virtual CPU could communicate with the first virtual CPU causing the first virtual CPU to flush the mapping in the virtual TLB.
This invention may be used in a virtual memory system of a VMM that virtualizes a multiprocessor hardware platform, where the virtualized hardware platform provides certain safeguards against virtual memory conflicts. The invention may also be used in other computer systems comprising multiple physical or logical processors, and involving software-based instruction processing. As will be described in greater detail below, the software-based processing of instructions, such as by interpretation or translation, may increase the likelihood of virtual memory conflicts, making it more difficult to provide the same safeguards against virtual memory conflicts as are provided by a physical implementation of a hardware platform.