Field of the Invention
The present invention relates in general to performing tablewalks in a microprocessor, and more particularly to performing hardware prefetch tablewalk operations having a lowest tablewalk priority for improving performance.
Description of the Related Art
Modern processors support virtual memory capability. A virtual memory system maps, or translates, virtual addresses used by a program to physical addresses used by hardware to address memory. Virtual memory has the advantages of hiding the fragmentation of physical memory from the program, facilitating program relocation, and of allowing the program to see a larger memory address space than the actual physical memory available to it. These advantages are particularly beneficial in modern systems that support time-sharing of the processor by multiple programs or processes.
A common virtual memory scheme supported by microprocessors is a paged memory system. A paged memory system employs a paging mechanism for translating, or mapping, virtual addresses to physical addresses. The physical address space is divided up into physical pages of fixed size. A common page size is 4 kilobytes (KB). The virtual addresses comprise a virtual page address portion and a page offset portion. The virtual page address specifies a virtual page in the virtual address space. The paging mechanism of the microprocessor translates the virtual page address into a physical page address. This process is known as page translation. The page offset specifies a physical offset in the physical page, i.e., a physical offset from the physical page address.
The operating system decides which physical pages in memory will be mapped to each virtual page and maintains page mapping information that specifies the mappings. When the microprocessor encounters an instruction that specifies a virtual address to access a location in memory, such as a load or store instruction, the microprocessor translates the virtual address to the proper physical address by using the operating system's page mapping information. The operating system maintains the page mapping information in system memory. Thus, the microprocessor reads the appropriate page mapping information from memory to translate the virtual address into the physical address. The page mapping information is typically hierarchically arranged in order to reduce its size, which requires the microprocessor to traverse the hierarchy by performing read operations at multiple levels of the hierarchy. For this reason, and because at least a portion of the page mapping information is commonly referred to as page tables, the process of the microprocessor traversing the page mapping information to translate a virtual address to a physical address is commonly referred to as a page table walk, or simply a tablewalk. The tablewalk operation thus translates a virtual address or the like into a physical address or physical address translation, more generally referred to as a translated address.
As an example, a popular hierarchical page mapping information scheme includes a first level page directory and second level page tables. Each entry in the page directory points to a different page table, and each entry in each page table includes the physical or translated address and characteristics of the page mapped to that entry. The base address of the page directory is stored in a register of the microprocessor. Such a scheme is illustrated in FIG. 3-12 on page 3-23 of the IA-32 Intel Architecture Software Developer's Manual, Volume 3A: System Programming Guide, Part 1, document number 253668-020US, published June 2006 by the Intel Corporation, which is incorporated by reference herein for all purposes. In this example, the microprocessor performs a tablewalk by reading the page directory entry at the index within the page directory specified by page directory entry bits in the upper portion of the virtual address. The page directory entry specifies the base address of the relevant page table. The microprocessor then reads the page table entry at the index within the page table specified by page table bits in the middle portion of the virtual address. The page table entry specifies the translated address of the relevant page. The page table entry also includes characteristics for each page. For example, the page characteristics may include an indication of whether the page has been accessed; whether the page has been written; caching characteristics, such as whether the page is cacheable and, if so, the write-back caching policy; which privilege level is assigned to the page; the write privileges of the page; and whether the page is present in physical memory.
The operating system populates the page directory entries and page table entries with the page characteristic values. However, the microprocessor also updates some of the page characteristics in response to program execution. For example, in the Intel scheme mentioned above, the processor writes the relevant page directory entry and/or page table entry to update the Accessed and/or Dirty bits in response to the program reading and/or writing memory pages. Thus, when performing a tablewalk, in addition to reading the page mapping information from system memory to translate a virtual address to a translated address (e.g., physical address or physical address translation), the processor may sometimes also have to write the page mapping information in system memory.
Because the page mapping information resides in system memory, and accesses to system memory are relatively slow, it is a relatively costly operation for the microprocessor to perform a tablewalk to perform a virtual to physical address translation and to obtain and/or update the page characteristics. To improve performance by reducing the number of tablewalks, many microprocessors provide a mechanism for caching the page mapping information. The page mapping information cache is commonly referred to as a translation lookaside buffer (TLB). When the microprocessor encounters a memory access instruction, the microprocessor provides the virtual address to the TLB and the TLB performs a lookup of the virtual page address. If the virtual page address hits in the TLB, then the TLB provides the corresponding translated physical page address and page characteristics, thereby avoiding the need to perform a tablewalk. However, if the virtual page address misses in the TLB, then the microprocessor performs a tablewalk. Thus, in addition to reading the page mapping information from memory and updating the page mapping information as necessary, the tablewalk also includes the microprocessor allocating an entry in the TLB and updating it with the translated physical address and page characteristics.
To summarize, broadly speaking, a tablewalk includes three steps. The first step is to read the necessary page mapping information from memory required to translate the virtual address to a translated address and to obtain the page characteristics. The second step is to update the page mapping information in system memory, if necessary. The third step is to allocate a TLB entry and update it with the new page mapping information.
Many modern microprocessors are superscalar. That is, the microprocessor includes multiple execution units and is capable of issuing multiple instructions to the execution units in a single clock cycle. Many modern microprocessors also perform out-of-order execution. That is, the microprocessor may execute instructions out of the order specified by the program that includes the instructions. Superscalar out-of-order execution microprocessors typically attempt to maintain a relatively large pool of outstanding instructions so that they can take advantage of a larger amount of instruction parallelism.
Many modern microprocessors also perform speculative execution of instructions. That is, the microprocessor executes instructions, or at least performs some of the actions prescribed by the instruction, before knowing certainly whether the instruction will actually complete. There are reasons why an instruction may not complete. For example, the microprocessor may have mispredicted a branch instruction that is older than the instruction in question. For another example, the microprocessor may take an exception before the instruction in question completes. The exception may be asynchronous, such as an interrupt, or it may be synchronous, i.e., caused by an instruction, such as a page fault, divide by zero condition, general protection error, and so forth. The exception-causing instruction may be the instruction in question or an instruction older than the instruction in question. Although the microprocessor may perform some of the actions prescribed by the instruction speculatively, the microprocessor is not allowed by the architecture to update the architectural state of the system with the results of an instruction until the instruction is no longer speculative, i.e., until it is certain that the instruction will complete.
Many modern microprocessors further perform hardware prefetches. Hardware prefetching in general means bringing data (or instructions) from memory into a cache memory in anticipation of a future need for that information. Hardware prefetches are highly speculative in nature since there is a significant chance that the retrieved information will not be used. A hardware prefetcher includes detectors or the like that recognize patterns and accesses in the microprocessor and spawn requests to retrieve information before it is needed or even requested by software. Conventional hardware prefetchers, however, stop when hitting or approaching a page boundary. Since the page size may be unknown, the smallest size of 4 KB is usually presumed. The page boundary limitation is somewhat based on legacy systems that rely on physical addresses in which accesses across a page boundary may be problematic.
In a virtual memory system, a virtual address of a hardware prefetch is first converted to a translated physical address. In the event that the translated address is not found in the TLB, a tablewalk is performed to retrieve the translated address using the virtual address. Although the hardware prefetch may provide a significant advantage if the retrieved information is used during subsequent processing, the hardware prefetch tablewalk may block software-based tablewalks. A software-based tablewalk is based on an actual software or code instruction and thus has a higher priority than a hardware prefetch operation. Thus, it is not advantageous to force software-based tablewalks to wait on lower priority hardware prefetch tablewalks.