1. Field of the Invention
This disclosure generally relates to techniques for reducing latency in shared-memory multiprocessor computer systems. More specifically, this disclosure relates to techniques for reducing address-translation latency and parallelizing translation and coherency operations.
2. Related Art
Computer memory is typically divided into a set of fixed-length blocks called “pages.” An operating system can provide a virtual memory abstraction to give a program the impression that it is accessing a contiguous address space that is larger than the actual available physical memory of the underlying computer system. During operation, the operating system and hardware of the computing device translate virtual into physical addresses in the physical memory. These translated physical addresses are then used to access the desired data from the memory hierarchy.
The latency caused by such address translation can significantly impact the performance of shared-memory server systems. Accessing a virtual address typically involves using specialized translation hardware to determine a corresponding physical memory address. This translation hardware often includes a translation lookaside buffer (TLB) which caches page table translation information to improve the speed of virtual address translations. Modern processors use multiple levels of such TLBs to avoid the latency of page table lookups. However, growing workload data-set sizes and an increase in the number of hardware threads that share a TLB increase TLB pressure, thereby resulting in increased TLB miss rates. In modern multiprocessor systems, a miss in a multi-level TLB initiates a page-table walk, which typically involves several DRAM accesses that can take hundreds of clock cycles to complete. In the worst case, a memory instruction first misses in the TLB (e.g., during translation) and then also misses in the cache hierarchy, resulting in an even larger delay.
Hence, what is needed are system structures and techniques for managing virtual address translations and physical address accesses without the above-described problems of existing techniques.