1. Field of the Invention
This disclosure generally relates to techniques for reducing latency in shared-memory multiprocessor computer systems. More specifically, this disclosure relates to techniques for reducing address-translation latency for page-table walks in shared-memory multiprocessor systems.
2. Related Art
Computer memory is typically divided into a set of fixed-length blocks called “pages.” An operating system can provide a virtual memory abstraction to give a program the impression that it is accessing a contiguous address space that is larger than the actual available physical memory of the underlying computer system. During operation, the operating system and hardware of the computing device translate virtual addresses into physical addresses in the physical memory. These translated physical addresses are then used to access the desired data from the memory hierarchy.
The latency caused by such address translation can significantly impact the performance of shared-memory server systems. Accessing a virtual address typically involves using specialized translation hardware to determine a corresponding memory address. This translation hardware often includes a translation lookaside buffer (TLB) which caches page-table translation information to improve the speed of virtual address translations. Modern processors use multiple levels of such TLBs to avoid the latency of page-table lookups. However, growing data-set sizes and an increase in the number of hardware threads that share a TLB are increasing TLB pressure, thereby resulting in increased TLB miss rates. In modern multiprocessor systems, a miss in a multi-level TLB initiates a page-table walk, which typically involves several DRAM accesses that can take hundreds of clock cycles to complete.
Hence, what is needed are system structures and techniques for managing virtual address translations without the above-described problems of existing techniques.