Field
The described embodiments relate to computing devices. More specifically, the described embodiments relate to using leases for entries in a translation lookaside buffer.
Related Art
Many modern computing devices use a virtual memory technique for handling data accesses by programs (e.g., applications, operating systems, device drivers, etc.) being executed in the computing devices. In such a computing device, when data is accessed by a program, a block of memory of a given size (e.g., 4 kB) that includes the data, which is called a “page” of memory, is copied from mass storage (e.g., a disk drive or semiconductor memory) to an available physical location in a main memory in the computing device and/or newly created in the memory (e.g., to store results generated from computational operations, etc.). In order to avoid programs being required to keep track of the physical locations of pages in memory, processors in the computing device keep track of the physical locations of the pages for the programs. In such a computing device, programs access memory using “virtual addresses” in “virtual address spaces,” which are local address spaces that are specific to corresponding programs, instead of accessing memory using addresses based on the physical locations of pages (or “physical addresses”). From a program's perspective, virtual addresses indicate the actual physical locations where data is stored within the pages in memory and memory accesses are made by programs using the virtual addresses accordingly. The virtual addresses, however, may not map directly to the physical addresses of the physical locations where data is stored in pages in the memory. As part of managing the physical locations of pages, the processors translate the virtual addresses used by the programs in memory access requests into the physical addresses where the data is actually located. The processors then use the physical addresses to perform the memory accesses for the programs.
In order to enable the above-described virtual address to physical address translation, the computing device includes a “page table.” The page table is a record stored in a memory of the computing device that includes an entry, or a “page table entry,” with virtual address to physical address translation information for pages of data that are stored in the main memory. Upon receiving a request from a program to access memory at a given virtual address, a processor acquires corresponding physical address information from the page table by performing a “page table walk,” during which the page table is searched, possibly entry-by-entry, for a page table entry that provides the physical address associated with the virtual address.
Because the above-described page table walks are relatively slow, it is desirable to avoid performing page table walks. The computing device therefore includes translation lookaside buffers (“TLBs”), which are local caches in each processor that are used by the processor for storing a limited number of copies of page table entries acquired during page table walks (or information based on page table entries). During operation, processors first attempt to acquire cached page table entries from the corresponding TLB for performing virtual address to physical address translations. When the copy of the corresponding page table entry is not present in the TLB (i.e., when a “miss” occurs), the processors perform a page table walk to acquire the desired page table entry—and cache a copy of the acquired page table entry in the TLB.
During operation, processors in the above-described computing devices may modify page table entries in the page table (e.g., change virtual address to physical address translation information for the page table entries, change a read/write property for page table entries, etc.). In order to avoid inconsistencies between the page table and copies of page table entries held in TLBs in other processors in the computing device, a processor that initiated the modification of the page table entry (or an “initiating processor”) can perform an operation called a “TLB shootdown.” Generally, during a TLB shootdown, a processor that is to modify a page table entry causes other processors that may hold a cached copy of the page table entry to invalidate the cached copy, thereby avoiding the inconsistencies.
When performing a TLB shootdown to enable modifying a page table entry, the initiating processor (e.g., a memory management unit in the initiating processor, an operating system executing on the initiating processor, etc.) modifies the page table entry. The initiating processor also determines other processors that may have copies of the information from the page table entry cached in their TLBs, and sends the other processors an inter-process interrupt (IPI) that indicates the page table entry being modified. Upon receiving the IPI, each of the other processors invalidates an entry in the corresponding TLB containing the page table entry, if such an entry exists in the corresponding TLB. Each other processor also returns, to the initiating processor, an acknowledgement. The initiating processor collects the acknowledgements and, when an acknowledgement has been received from each of the other processors proceeds with subsequent operations. During these operations, the processor may switch between kernel-mode and user-mode.
Because the above-described operations for performing a TLB shootdown are long latency, performing the TLB shootdown typically requires a significant amount of time to complete (e.g., tens of thousands of cycles of a clock in the processor). Compounding this problem, the latency of these operations increases as the number of processors in the computing device increases. For example, when central processing units (CPUs) and graphics processing units (GPUs) share an address space in a computing device, both the CPUs and the GPUs must participate in TLB shootdowns.
Throughout the figures and the description, like reference numerals refer to the same figure elements.