The disclosure relates generally to methods and apparatus that support multiple compressed encodings within a translation lookaside buffer.
Computing devices, such as central processing units (CPUs) and graphics processing units (GPUs), typically include translation lookaside buffers (TLBs) that allow for the retrieval of recent virtual memory address to physical memory address translations (e.g., virtual system page to physical system page translations). Such translations are necessary when, for example, executing programs reference a virtual memory address which then has to be translated to a physical memory address. A virtual memory address may reside, for example, in a guest physical page (GPP), while a physical memory address may reside, for example, in a system physical page (SPP). Typically, the TLB is in the form of a memory cache that stores recent translations to allow for a quicker retrieval of such a translation. When a virtual memory address to a physical memory address is referenced, the TLB is searched to see if the translation, in the form of a TLB entry, is available. If it is not available in the TLB, known in the art as a “TLB miss,” system page table entries (PTEs) in physical memory are searched with what is known in the art as a memory page-table walk (e.g., memory page crawl). This operation is performed by a hardware page table walker, as known in the art. In addition, the translation may be stored in the TLB for future reference. If, however, the translation is available in the TLB, known in the art as a “TLB hit,” the physical address is provided without having to search the physical memory. Thus, the translation is achieved much faster, as the memory page-table walk operation is avoided.
In addition, TLB performance is critical in many situations, especially as memory sizes continue to increase while memory page sizes remain at 4 kilo-byte pages. Reducing TLB misses directly improves performance (i.e., no delays waiting for translations from memory page-table walks) and reduces energy consumption (i.e., no energy spent on memory page-table walks). To allow TLBs to include more TLB entries, current solutions allow for the compressing or combining of multiple TLB entries into a single TLB entry. These solutions propose different TLB encodings that enable the effective compression of multiple PTEs into a single TLB entry. Each type of encoding can provide effective compression of certain types of PTE patterns, and there are some patterns where not all encoding schemes can handle.
For example, the method of CoLT (Coalesced Large-reach TLBs) provides a way to combine (e.g., coalesce, encode) multiple TLB entries that have strictly sequential mappings. For example, if there are three virtual system page to physical system page translations such as V0->P4, V1->P5, V2->P6, where the source virtual pages V0, V1, V2 are all sequential, and the mapped-to physical pages P4, P5, P6 are also sequential, CoLT allows for the three translations to be combined into a single TLB entry with a format such as “{V0->P4},{3}” where the “3” indicates that the mapping applies to three consecutive/sequential mappings. This provides an improvement to the TLB by replacing what would otherwise require three TLB entries with just a single one.
FIG. 1 shows an example TLB 102 that supports the CoLT encoding method which allows for the compression or coalescing of multiple sequentially consecutive PTEs that all map to correspondingly sequentially consecutive physical pages. As shown, the TLB entry 104 includes a virtual tag field 106 corresponding to the virtual memory page(s) being translated by the PTE, a valid bit field (V) 108, read/write/execute memory page permission field 110 (CoLT requires all memory pages represented by the TLB entry to have the same memory page permissions), replacement metadata (LRU) 112, the physical memory base physical page number (PPN) 114, and a run-length field 116 indicating the number of sequential PTEs represented by this coalesced TLB entry. An example 118 of sequentially consecutive PTEs that can be encoded into a compressed TLB entry using the CoLT method is also shown in the figure. In this example, instead of using five separate conventional TLB entries, a single coalesced TLB entry is used, leaving other TLB entries available to store other translations. A compressed TLB entry using the CoLT method may be larger (i.e., take up more TLB entry memory space) than a conventional un-encoded TLB entry, as the CoLT method includes a run length field, as indicated in FIG. 1. Another example 120 shows a mapping where there is a single PTE entry (e.g., singleton) that is not able to be coalesced. In this case, the single PTE is cached in a TLB entry with the “run length” field set to one.
Another method, known as clustered TLBs, act in a somewhat similar fashion, but can encode multiple nearby or “clustered” mappings even if they are not strictly sequential (e.g., there may be gaps in the sequence, and some mappings may be “out of order”). The clustered TLB format can combine additional patterns that the CoLT approach cannot, but there are also patterns that the CoLT approach can encode that the clustered approach cannot. Each of these two methods has its own strengths and weaknesses. Furthermore, clustered TLB entries may also be larger than conventional un-encoded TLB entry methods. As such, other proposals include using two separate TLB structures (e.g., one for un-encoded TLB entries and another for clustered TLB entries), which introduces additional complexity and overheads.
FIG. 2 shows an example TLB 202 that supports the clustered encoding approach allowing for the compression of multiple PTEs that map to physical pages that are not necessarily sequential in order. As shown, the TLB entry 204 includes a virtual tag field 206 corresponding to the virtual memory page(s) being translated by the PTE, a physical memory base physical page number (PPN) field 208, a read/write/execute memory page permission field 210, and sub-entries 212. As indicated in the figure, each sub-entry 212 includes a valid field (V) 214, a modified field 216, and a physical page offset field 218.
An example of PTEs 220 that may be encoded in the clustered TLB entry format, along with the corresponding encoded TLB 222, is also shown in the figure. As indicated in the figure, all memory pages associated with the encoded TLB 222 share a base physical page number, but then are distinguished by individual offsets encoded in the individual sub-entry fields. For example, the mapping 0x8001000→0xABCA000 is reconstructed by using the virtual page offset of one (0x8001−0x8000=1), to select sub-entry #1. The physical page offset in sub-entry #1, indicated as 2 in the figure, is added to the base physical page (0xABC8+2=0xABCA), to compute the final physical page number (0xABCA000) where the least significant twelve bits (i.e., three hexadecimal digits) are always zero when assuming a 4 KB page size. Where an individual page is not mapped or is not able to be encoded in this format, the valid bit in the corresponding sub-entry is set to zero (represented by a ‘-’ as shown in sub-entries 3, 4, and 6). In the case of a singleton PTE, either an entire clustered entry is used to encode the single PTE, which is not desirable as it underutilizes the full capabilities of the clustered TLB format, or a separate TLB structure that only supports non-encoded (e.g., non-clustered) TLB entries is maintained. Note that due to the additional sub-entry information, the size of a TLB entry supporting this clustered format is typically larger than that for the CoLT-style encoding or for a conventional un-encoded TLB entry.
These proposals for combining multiple TLB entries into a single coalesced or clustered TLB entry are limited, however, in that they support only a single form of encoding. As a result, the opportunities to encode multiple PTEs into an encoded TLB entry are such that only PTEs satisfying the conditions of the encoding scheme used are able to be encoded.