Modern computer systems use memory management units (MMUs) to manage writing data to and reading data from one or more physical memory devices, such as solid state memory devices, for example. The MMU of a computer system provides a virtual memory to the central processing unit (CPU) of the computer system that allows the CPU to run each application program in its own dedicated, contiguous virtual memory address space rather than having all of the application programs share the physical memory address space, which is often fragmented, or non-contiguous. The purpose of the MMU is to translate virtual memory addresses (VAs) into physical memory addresses (PAs) for the CPU. The CPU indirectly reads and writes PAs by directly reading and writing VAs to the MMU, which translates them into PAs and then writes or reads the PAs.
In order to perform the translations, the MMU accesses page tables stored in the system main memory. The page tables are made up of page table entries. The page table entries are information that is used by the MMU to map the VAs into PAs. The MMU typically includes a translation lookaside buffer (TLB), which is a cache memory element used to cache recently used mappings. When the MMU needs to translate a VA into a PA, the MMU first checks the TLB to determine whether there is a match for the VA. If so, the MMU uses the mapping found in the TLB to compute the PA and then accesses the PA (i.e., reads or writes the PA). This is known as a TLB “hit.” If the MMU does not find a match in the TLB, this is known as a TLB “miss.”
In the event of a TLB miss, the MMU performs what is known as a hardware table walk (HWTW). A HWTW is a time-consuming and computationally-expensive process that involves performing a “table walk” to find the corresponding page table in the MMU and then reading multiple locations in the page table to find the corresponding VA-to-PA address mapping. The MMU then uses the mapping to compute the corresponding PA and writes the mapping back to the TLB.
In computer systems that implement operating system (OS) virtualization, a virtual memory monitor (VMM), also commonly referred to as a hypervisor, is interposed between the hardware of the computer system and the system OS of the computer system. The hypervisor executes in privileged mode and is capable of hosting one or more guest high-level OSs. In such systems, application programs running on the OSs use VAs of a first layer of virtual memory to address memory, and the OSs running on the hypervisor use intermediate physical addresses (IPAs) of a second layer of virtual memory to address memory. In the MMU, stage 1 (S1) translations are performed to translate each VA into an IPA and stage 2 (S2) translations are performed to translate each IPA into a PA.
If a TLB miss occurs when performing such translations, a multi-level, two-dimensional (2-D) HWTW is performed to obtain the table entries that are needed to compute the corresponding IPA and PA. Performing these multi-level, 2-D HWTWs can result in a significant amount of computational overhead for the MMU, which typically results in performance penalties.
FIG. 1 is a pictorial illustration of a known three-level, 2-D HWTW that is performed when a TLB miss occurs while performing a read transaction. The HWTW shown in FIG. 1 represents a worst case scenario for a three-level, 2-D HWTW that requires the performance of fifteen table lookups to obtain the PA where the data is stored in physical memory. For this example, the MMU of the computer system is running a hypervisor that is hosting at least one guest high-level OS (HLOS), which, in turn, is running at least one application program. In such a configuration, the memory that is being allocated by the guest HLOS is not the actual physical memory of the system, but instead is the aforementioned intermediate physical memory. The hypervisor allocates actual physical memory. Therefore, each VA is translated into an IPA, which is then translated into a PA of the actual physical memory where the data being read is actually stored.
The process begins with the MMU receiving a S1 page global directory (PGD) IPA 2. For this worst case scenario example, it will be assumed that a TLB miss occurs when the MMU checks the TLB for a match. Because of the miss, the MMU must perform a HWTW. The HWTW involves performing three S2 table lookups 3, 4 and 5 to obtain the mapping needed to convert the IPA 2 into a PA and one additional lookup 6 to read the PA. The table lookups 3, 4 and 5 involve reading the S2 PGD, page middle directory (PMD) and page table entry (PTE), respectively. Reading the PA at lookup 6 provides the MMU with a S1 PMD IPA 7. For this worst case scenario example, it will be assumed that a TLB miss occurs when the MMU checks the TLB for a match with the S1 PMD IPA 7. Because of the miss, the MMU must perform another HWTW. The HWTW involves performing three S2 table lookups 8, 9 and 11 to obtain the mapping needed to convert the S1 PMD IPA 7 into a PA and one additional lookup 12 to read the PA. The table lookups 8, 9 and 11 involve reading the S2 PGD, PMD and PTE, respectively. Reading the PA at lookup 12 provides the MMU with a S1 PTE IPA 13.
For this worst case scenario example, it will be assumed that a TLB miss occurs when the MMU checks the TLB for a match with the S1 PTE IPA 13. Because of the miss, the MMU must perform another HWTW. The HWTW involves performing three S2 table lookups 14, 15 and 16 to obtain the mapping needed to convert the S1 PTE IPA 13 into a PA and one additional lookup 17 to read the PA. The table lookups 14, 15 and 16 involve reading the S2 PGD, PMD and PTE, respectively. Reading the PA at lookup 17 provides the MMU with the actual IPA 18. For this worst case scenario example, it will be assumed that a TLB miss occurs when the MMU checks the TLB for a match with the actual IPA 18. Because of the miss, the MMU must perform another HWTW. The HWTW involves performing three S2 table lookups 19, 21 and 22 to obtain the mapping needed to convert the actual IPA 18 into a PA. The table lookups 19, 21 and 22 involve reading the S2 PGD, PMD and PTE, respectively. The PA is then read to obtain the corresponding read data. Reading the PA at lookup 18 provides the MMU with a S1 PTE IPA 13.
Thus, it can be seen that in the worst case scenario for a three-level, 2-D HWTW, twelve S2 table lookups and three S1 table lookups are performed, which is a large amount of computational overhead that consumes are large amount of time and results in performance penalties. A variety of techniques and architectures have been used to reduce the amount of time and processing overhead that is involved in performing HWTWs, including, for example, increasing the size of the TLB, using multiple TLBs, using flat nested page tables, using shadow paging or speculative shadow paging, and using page walk cache. While all of these techniques and architectures are capable of reducing processing overhead associated with performing HWTWs, they often result in an increase in processing overhead somewhere else in the computer system.
Accordingly, a need exists for computer systems and methods that reduce the amount of time and computing resources that are required to perform a HWTW.