The disclosure herein relates generally to data processing, and more particularly, to methods, apparatus, and products for optimizing lookups in a translation lookaside buffer (TLB) in a computer system.
Memory management, i.e., the operations that occur in managing the data stored in a computer, is often a key factor in overall system performance for a computer. Among other tasks, memory management oversees the retrieval and storage of data on a computer, as well as manages certain security tasks for a computer by imposing restrictions on what users and computer programs are permitted to access.
System configurations include physical memory used to store applications and data. The amount of physical memory is fixed and often inadequate to support the needs of users. Therefore, to provide additional memory or at least the appearance of additional memory, a memory management technique, referred to as virtual memory, is utilized. Virtual memory uses virtual addressing, which provides ranges of addresses that can appear to be much larger than the physical size of main memory.
Virtual addressing is a memory mapping mechanism that is used by operating systems for purposes such as security based on process isolation. Using virtual addressing, processors can access memory, using physical addresses that are generated from Virtual Address (VA) to Physical Address (PA) translation. To accelerate the VA to PA translation process, processors can use Translation Lookaside Buffers (TLB). A TLB is essentially a cache of page table entries mapping virtual addresses to physical addresses. With each memory access, the TLB is presented with a virtual address. If the address hits in the TLB, virtual address translation adds little or no overhead to the memory access. If the address misses in the TLB, a more costly hardware handler or software handler is invoked to load and insert the required page table entry into the TLB so the address will hit in the TLB and the memory access can proceed.
Embedded processors with software loaded TLBs can have poor performance on some workloads. Responsible for this poor performance is the overhead of resolving in software the virtual address translations that aren't cached in the TLB. This is generally why higher end processors provide a hardware mechanism to load translations in the TLB automatically. Such hardware mechanisms, however, tend to be complex and expensive. There are several conventional approaches to hardware loading of virtual address translations. These conventional approaches include: tree structured page tables; hashed page tables; virtual linear page tables; page table pointer caches; and TLBs with both page table pointers and page table entries. Each of these approaches is discussed briefly below.
The tree structured page tables (e.g., Radix address translation) approach uses a tree structure in memory. The root of the tree is identified by a physical address in memory, and bits from the virtual address are used as an index at each level of the tree until a page table entry is found. While the final page table entry (PTE) found in the tree structure is cached in a TLB, the intermediate at each level are cached in a page walk cache (PWC).
Another conventional approach to hardware loading of virtual address translations into TLBs utilizes hashed page tables (HPT). In HPT translation, For instance, in PowerPC systems offered by International Business Machines Corporation, an effective address is translated to a corresponding real address by way of page table entries found by selecting an effective segment identifier (ESID) table entry associated with the effective address, and using the entry to locate a group of page table entries by way of a hashing algorithm.
Tree structured page tables and HPT require different hardware structures (e.g., HPT requires a segment lookaside buffer (SLB) and Radix requires a PWC). Furthermore, the TLB structures of the HPT and Radix translations are also different. However, simultaneous multithreading (SMT) often includes some instruction threads running HPT address translation and other threads running Radix address translation. Hence, both translation algorithms must be supported concurrently in current processing systems.
Certain existing systems solve this problem by dividing the TLB indices into two sets, and assigning one set to HPT translation and the other set to Radix translation. While this allows the system to use a single TLB for supporting both translation schemes, division of the TLB in such a manner means that the HPT threads and the Radix threads can only use a part of the TLB at a time, leading to a decrease in TLB efficiency.