Modern processors often include caches to improve the performance of accessing external memory by providing copies of instructions and/or data in smaller faster memories with shorter access latencies. In addition such caches may provide support for fast virtual to physical address translations using a device such as a translation lookaside buffer (TLB) to cache virtual to physical address translations, for example from the system page tables in a paged virtual memory system. When a TLB matches a virtual address to one of the translations stored in the TLB, we may refer to such and event as a TLB hit and the retrieved physical address can be used to access memory in a cache or in main memory more quickly. When a TLB fails to match a virtual address to one of the translations stored in the TLB, we may refer to such and event as a TLB miss or a page miss, and the translation proceeds by looking up the corresponding page table in a process called a page walk.
A page walk is an expensive process, as it involves reading the contents of multiple memory locations and using them to compute the physical address. Modern processors often include a page-miss handler (PMH) in hardware to perform the page walk more quickly. After the physical address is determined by the page walk, the virtual address to physical address mapping is entered into the TLB to be reused in subsequent accesses.
In a multi-core system with multiple processing cores, each of the multiple processing cores may include such a PMH to facilitate page walks on page misses and to populate their respective TLB. In this disclosure, we may refer to a core or processing core in contrast to a thread or execution thread. The processing core may include support for multiple execution threads, including for example, a per-thread general-purpose register file, a per-thread floating-point register file, per-thread execution queues, per-thread state information storage, and partitionalble cache or caches and TLB storage.
In a multi-core system with multiple processing cores, certain other processing hardware or devices may also access the systems main memory. A graphics processor, for example, may read and write to buffers in memory at locations provided by a central processing core or cores. In some systems it may also be desirable for a graphics processor to access a shared cache along with the central processing core or cores to improve access times. In such systems it may even be desirable to provide a device, such as a graphics processor or video processor, etc., with a TLB to cache virtual to physical address translations, and a PMH to facilitate page walks on page misses and to populate their respective TLB.
As the number of processing cores and other devices accessing caches or using virtual memory increases, there may be good reason to expect problems, such as additional memory congestion, and conflicts, and duplication of page walks when virtual memory space is shared by more of these devices.
To date, potential solutions to such reasonably expected problems have not been adequately explored.