1. Field of Art
The disclosure generally relates to the field of processor design and memory.
2. Description of the Related Art
Memory management unit (MMU)-based systems provide for virtual memory by translating virtual addresses to physical addresses. This translation is a lookup process where the MMU receives a search key (the virtual address) and outputs the corresponding physical address.
In certain architectures, the MMU primarily uses page tables in random access memory (RAM) for lookup. An operating system (OS) is responsible for managing these page tables. Since trips to RAM are several orders of magnitude slower than CPU processing, the MMU caches the recently used translation entries in a fast on-chip associative memory called a translation lookaside buffer (TLB). The MMU uses the TLB for first level lookup, and as a fall-back, the MMU resorts to page table traversal.
In RISC type architectures, on the other hand, the TLB is not a cache, but rather a database of translation entries. The OS is responsible for all TLB entry management. The MMU is agnostic of the page tables, although the OS still has to maintain the page tables and possibly other related data structures (e.g., the source of translation entries). In this way, the OS oversees the memory management of all running processes.
Any modern-day OS runs several processes concurrently. Each process has its own virtual address space to map its code, data, .bss segment, stack, heap and some other regions. Since all processes have their own virtual address space, is possible for multiple processes to reference the same virtual address, even though they intend to refer to different pieces of code, and ultimately different physical addresses. This can cause TLB entries with overlapping virtual addresses. To remedy this, the OS either has to flush the entire TLB across every task-switch, or the MMU needs to provide a mechanism to let overlapping TLB entries co-exist. In the latter case, the MMU must also ensure that lookup searches only through the relevant entries.
Some MMUs provides this disambiguation in the form of an address space ID (ASID), an additional identifier, used to “tag” the TLB entries. The OS manages the ASID and assigns each process a unique value. TLB lookup now uses a compound key comprising a virtual address and the ASID. This allows entries for different processes with identical virtual addresses to coexist. Every time the process changes, an MMU process ID (PID) register is setup with the new process's ASID. All TLB operations (both in hardware and software) happen in the context of this ASID. In this way, multiple processes can be run sequentially without modifying their virtual address space or flushing the TLB on every task switch.
As modern day programs get bigger and bulkier, they need to rely on reusing existing code in form of libraries. Most programs link with shared code, such as libraries, for example, libc (the ubiquitous “C” library), libpthread, libm, etc. While static libraries do allow code reuse, they cause each program to link in a copy of the library or libraries, blowing up the system memory footprint. Shared libraries mitigate this by having system runtime (the dynamic loader assisted by the OS) load only a single instance of the shared library into RAM. Programs then dynamically link with the libraries at runtime using shared mapping (mmap system call). These links use virtual addresses in each process's private address space. This is an optimal solution which allows code reuse, low memory footprint and flexibility in choice of virtual address space used for mapping, at the same time. However, shared libraries could be bottlenecked by the ASID feature of the MMU explained above. To understand this better, we use the ARC700 MMU as an example MMU implementation; although it applies to most software managed TLB MMU designs.
Linux, as it runs on an ARC700, uses one ASID per processes. This means that each running process must have its own virtual address space. If two or more processes are using shared libraries, each process will therefore require a process-private TLB entry corresponding with the shared library. This is true even when the virtual addresses all correspond to the same physical page. This is wasteful because each of the TLB entries across the running processes is the same except for the ASID. For example, if there are 10 processes running concurrently, the libc code is hypothetically 16 KB (the ARC700 MMU has an 8 KB page size), and all processes are executing code in either of the 2 pages of libc, then there will be 10×(16 KB/8 KB)=20 TLB entries corresponding to libc code alone in the system. This increases the pressure on the TLB when heavy-weight libraries such as webkit are used whose text alone is 7.5 MB. In effect, the process-private TLB entries reduce the effective TLB size. This causes increased contention for TLB slots, which therefore increase the frequency of entry eviction/replacement, meaning precious CPU cycles are wasted in the OS while it runs TLB fault handlers.