Cloud services, such as Amazon® EC2 or Rackspace® OpenStack, use virtualization platforms to provide their services. These platforms use hypervisors (ESX, kernel-based virtual machine (KVM), Xen) to enable easier scalability of applications and higher system utilization by abstracting the underlying host resources. Unfortunately, these systems have an increased level of memory virtualization pressure as hypervisors and guests keep separate page tables. A page table is a data structure configured to store the mapping between the virtual addresses and the physical addresses. The amount of address translation increases as the hypervisor-based systems involve two dimensional page table walks (to determine whether a mapping between the virtual address and the physical address exists in the page table). In a system with radix-4 page tables as in recent x86 processors, a translation lookaside buffer (TLB) miss can result in up to 24 memory accesses, resulting in huge translation overheads. A TLB stores the recent translations of virtual memory to physical memory. With the increased number of processor cores and big data sets, conventional static random-access memory (SRAM) TLBs, whose capacities are constrained by latency requirements, cannot hold all translations of “hot pages” (pages accessed frequently). While higher hit rates are desirable, larger SRAM TLBs incur higher latencies.
Unfortunately, there is not currently a means for designing a processor architecture structure that provides performance improvement in a virtualized environment, namely, eliminating a large number of expensive page table walks.