Field of the Invention
The present invention relates in general to page invalidation performed by a processor, and more particularly to a processor with a single invalidate page instruction for invalidating matching translation lookaside buffer and paging cache entries.
Description of the Related Art
Modern processors support virtual memory capability. A virtual memory system maps, or translates, virtual (a.k.a., “linear”) addresses used by a program to physical addresses used by hardware to address system memory. Virtual memory provides the advantages of hiding the fragmentation of physical memory from the program and facilitating program relocation. Virtual memory thus allows the program to see a larger memory address space than the actual physical memory available to it. These advantages are particularly beneficial in modern systems that support time-sharing of the processor by multiple programs or processes.
An operating system (OS) executing on the processor implements the virtual memory system by creating and maintaining in system memory page tables (a.k.a., translation tables) in a paged virtual memory system. The page tables map virtual addresses to physical addresses of system memory coupled to the processor. The page tables may be in the form of a hierarchy of tables, some of which map virtual addresses to intermediate table addresses. When a program accesses memory using a virtual address, the page tables are accessed in sequential order to accomplish the translation of the virtual address to its physical address, commonly referred to as a page table walk, or “tablewalk.” A tablewalk involves numerous accesses of external system memory and can often be a time-consuming process that reduces processor performance.
The processor may include at least one translation lookaside buffer (TLB). A TLB is a hardware structure of a processor that caches the virtual to physical address translations in order to greatly reduce the likelihood of the need for a tablewalk. The TLB compares the virtual address to be translated to previously stored virtual addresses in the TLB and if it hits in the TLB (e.g., when a virtual address match is found), the TLB provides the corresponding physical address. Retrieving the physical address from the TLB consumes much less time than would be required to access the page tables in system memory to perform the tablewalk.
The processor may also support one or more paging caches that cache information for one or more of the page tables. For example, chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, January 2015 (referred to herein as the “Intel® system programming guide”), which is hereby incorporated by reference in its entirety for all intents and purposes, describes an IA-32e paging mode. The IA-32e paging mode includes a level 4 page map table (PML4), a page directory pointer table (PDPT), a page directory (PD), and a page table (PT), all located in the system memory. The processor may include a separate paging cache for each page table. A hit in a paging cache enables bypassing of one or more of the page tables in the system memory to reduce the number of system memory accesses. In this manner, the tablewalk process may be significantly accelerated when there is a hit within a paging cache.
It is often desired or even necessary to invalidate TLB and/or paging cache entries. If the processor attempts to use an invalid translation that is otherwise marked as valid, an error occurs which may result in improper operation. Chapter 2 of the Intel® system programming guide describes an x86 instruction set architecture (ISA) instruction “INVLPG” that is intended to invalidate a TLB entry for a specified page. A conventional x86 processor responds to the INVLPG instruction by executing an invalidate page microcode routine. The conventional invalidate page microcode routine manually accessed and searched each TLB and paging cache entry, one at a time, and invalidated any matching entries. The conventional microcode invalidation routine was a complex operation that consumed valuable processor time and resources.