Many modern microprocessors support the notion of virtual memory. In a virtual memory system, instructions of a program executing on the microprocessor refer to data using virtual addresses in a virtual address space of the microprocessor. The microprocessor translates virtual addresses into physical addresses that it uses to access physical memory.
A common virtual memory scheme supported by microprocessors is a paged memory system. A paged memory system employs a paging mechanism for translating, or mapping, virtual addresses to physical addresses. The physical address space is divided up into physical pages of fixed size. A common page size is 4 KB. The virtual addresses comprise a virtual page address portion and a page offset portion. The virtual page address specifies a virtual page in the virtual address space. The paging mechanism of the microprocessor translates the virtual page address into a physical page address. This process is known as page translation. The page offset specifies a physical offset in the physical page, i.e., a physical offset from the physical page address.
The operating system decides which physical pages in memory will be mapped to each virtual page and maintains page mapping information that specifies the mappings. When the microprocessor encounters an instruction that specifies a virtual address to access a location in memory, such as a load or store instruction, the microprocessor must translate the virtual address to the proper physical address by using the operating system's page mapping information. The operating system maintains the page mapping information in system memory. Thus, the microprocessor must read the appropriate page mapping information from memory to translate the virtual address into the physical address. The page mapping information is typically hierarchically arranged in order to reduce its size, which requires the microprocessor to traverse the hierarchy by performing read operations at multiple levels of the hierarchy. For this reason, and because at least a portion of the page mapping information is commonly referred to as page tables, the process of the microprocessor traversing the page mapping information to translate a virtual address to a physical address is commonly referred to as a page table walk, or simply tablewalk.
As an example, a popular hierarchical page mapping information scheme includes a first level page directory and second level page tables. Each entry in the page directory points to a different page table, and each entry in each page table includes the physical address and characteristics of the page mapped to that entry. The base address of the page directory is stored in a register of the microprocessor. Such a scheme is illustrated in FIG. 3-12 on page 3-23 of the IA-32 Intel Architecture Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, document number 253668-020US, published June 2006 by the Intel Corporation, which is incorporated by reference herein for all purposes. In this example, the microprocessor performs a tablewalk by reading the page directory entry at the index within the page directory specified by page directory entry bits in the upper portion of the virtual address. The page directory entry specifies the base address of the relevant page table. The microprocessor then reads the page table entry at the index within the page table specified by page table bits in the middle portion of the virtual address. The page table entry specifies the physical address of the relevant page. The page table entry also includes characteristics for each page. For example, the page characteristics may include an indication of whether the page has been accessed; whether the page has been written; caching characteristics, such as whether the page is cacheable and, if so, the write-back caching policy; which privilege level is assigned to the page; the write privileges of the page; and whether the page is present in physical memory. The operating system populates the page directory entries and page table entries with the page characteristic values. However, the microprocessor must update some of the page characteristics in response to program execution. For example, in the Intel scheme mention above, the processor writes the relevant page directory entry and/or page table entry to update the Accessed and/or Dirty bits in response to the program reading and/or writing memory pages. Thus, when performing a tablewalk, in addition to reading the page mapping information from system memory to translate a virtual address to a physical address, the processor may sometimes also have to write the page mapping information in system memory.
Because the page mapping information resides in system memory, and accesses to system memory are relatively slow, it is a relatively costly operation for the microprocessor to perform a tablewalk to perform a virtual to physical address translation and to obtain and/or update the page characteristics. To improve performance by reducing the number of tablewalks, many microprocessors provide a mechanism for caching the page mapping information. The page mapping information cache is commonly referred to as a translation lookaside buffer (TLB). When the microprocessor encounters a memory access instruction, the microprocessor provides the virtual address to the TLB and the TLB performs a lookup of the virtual page address. If the virtual page address hits in the TLB, then the TLB provides the corresponding translated physical page address and page characteristics, thereby avoiding the need to perform a tablewalk. However, if the virtual page address misses in the TLB, then the microprocessor must perform a tablewalk. Thus, in addition to reading the page mapping information from memory and updating the page mapping information as necessary, the tablewalk also includes the microprocessor allocating an entry in the TLB and updating it with the translated physical address and page characteristics.
To summarize, broadly speaking, a tablewalk comprises three steps. The first step is to read the necessary page mapping information from memory required to translate the virtual address to a physical address and to obtain the page characteristics. The second step is to update the page mapping information in system memory, if necessary. The third step is to allocate a TLB entry and update it with the new page mapping information.
Many modern microprocessors are superscalar. That is, the microprocessor includes multiple execution units and is capable of issuing multiple instructions to the execution units in a single clock cycle. Many modern microprocessors also perform out-of-order execution. That is, the microprocessor may execute instructions out of the order specified by the program that includes the instructions. Superscalar out-of-order execution microprocessors typically attempt to maintain a relatively large pool of outstanding instructions so that they can take advantage of a larger amount of instruction parallelism.
Many modern microprocessors also perform speculative execution of instructions. That is, the microprocessor executes instructions, or at least performs some of the actions prescribed by the instruction, before knowing certainly whether the instruction will actually complete. There are reasons why an instruction may not complete. For example, the microprocessor may have mispredicted a branch instruction that is older than the instruction in question. For another example, the microprocessor may take an exception before the instruction in question completes. The exception may be asynchronous, such as an interrupt, or it may be synchronous, i.e., caused by an instruction, such as a page fault, divide by zero condition, general protection error, and so forth. The exception-causing instruction may be the instruction in question or an instruction older than the instruction in question. Although the microprocessor may perform some of the actions prescribed by the instruction speculatively, the microprocessor is not allowed by the architecture to update the architectural state of the system with the results of an instruction until the instruction is no longer speculative, i.e., until it is certain that the instruction will complete.
When a conventional out-of-order execution microprocessor suffers a TLB miss that necessitates a tablewalk, the microprocessor serializes the tablewalk with the other outstanding program instructions. That is, the conventional microprocessor waits until all program instructions older than the initiator instruction (the instruction that caused the TLB miss) have retired before it performs the tablewalk and does not issue to the execution units for execution any program instructions newer than the initiator instruction until it completes the tablewalk. Because the conventional microprocessor serializes tablewalks, and because the number of instructions older than the initiator instruction may be large and/or some of the instructions may be long latency instructions such as memory access instructions or floating point instructions, the conventional microprocessor may wait a relatively long time to perform the tablewalk and the conventional microprocessor may be forfeiting the opportunity of executing instructions newer than the initiator instruction. This may significantly adversely affect the performance of the conventional microprocessor.