A memory management unit (MMU) is a device that performs the translation of virtual memory addresses to physical addresses (address translations). The memory management unit may be implemented as part of a processor device such as a central processing unit (CPU) or graphics processing unit (GPU), but it also can be implemented as separate integrated circuit from the processor. A memory management unit may partition the virtual address space (the range of addresses used by the processor) into pages. Some portion of the virtual address (e.g., the least significant bits) may be the same bits used in the corresponding physical address. This portion of the virtual address is referred to as the offset. Other bits of the virtual address (e.g., the most significant bits) may select the page.
A memory management unit may utilize a structure called a page table, comprising one page table entry (PTE) per page, to map virtual page addresses to physical page addresses in memory. An associative cache of PTEs is called a translation look-aside buffer (TLB). The TLB stores recent translations of virtual addresses to physical addresses and may be thought of as an address-translation cache. The physical page address from the TLB, corresponding to the virtual page address from the processor, is combined with the offset bits to form the complete physical address corresponding to the virtual address. Page tables may be organized as a hierarchical structure or as a flat structure. In the case of a hierarchical structure, intermediate levels may also be cached.
Invalidation is a process by which entries in various caches are marked for replacement or removal. In content addressable caches where cache entries are tagged with a class value, it may take a long time to look up and invalidate each cache entry for the class when the class as a whole is invalidated. In conventional approaches to cache class invalidation, the process issuing the invalidate by class command to the MMU waits for the MMU to provide an acknowledgement signal indicating completion of the invalidation. The cache may be unavailable to other processes executing in the system during this waiting time, degrading the system's performance. Herein, cache refers to various types of cache memory, including instruction, data, and address (e.g., TLB) caches.
By way of example, every process in an operating system (OS) may have a context ID associated with it. The context ID for a process may be used as a class for cache entries for the process. There are circumstances in which the OS will invalidate the entire context for a process. This involves among other things removing all the page table entries for the process and sending an invalidate by class command to the memory management unit to invalidate all of the cache entries for the process that are cached in one or more translation look-aside buffers.
When the OS invalidates the context of a process, it sends the invalidate by class command to trigger the memory management unit to perform the invalidation of cache entries tagged with the class for the context. The OS may then wait for an acknowledgement signal from the memory management unit indicating that the invalidation of the context for the process has completed.
In response to the invalidate by class command from the OS to invalidate the context for a process, the memory management unit looks up all the cache entries for the process in the respective translation look-aside buffers and MMU caches, invalidates those cache entries, and sends the acknowledgement signal to the OS indicating that the invalidation was completed. The cache entries for the process are the ones tagged with the class corresponding to the process.
The memory management unit attempts to balance the execution of invalidate by class commands with the continued performance of address translations for active contexts so that the impact on the performance of the system overall remains at acceptable levels.
After sending the invalidate by class command, the OS typically executes a polling loop while waiting for the acknowledgement signal from the memory management unit. Thus an OS is effectively blocked from further execution until the memory management unit completes the invalidation of all of the cache entries for the context.
The polling loop consumes central processing unit (CPU) cycles and hampers performance, especially when the CPU is also busy executing other tasks. The real-time responsiveness of the OS may also be hampered in scenarios in which the invalidate by class command is invoked from an interrupt service routine.
Existing solutions utilize scheduling algorithms to decrease the latency of responding to an invalidate by class command, but the latency still depends on the state of the active address translations under way for contexts other than the one being invalidated.