Overall performance of a computer system heavily depends on the efficiency of the memory hierarchy. The memory system performance is dependent not only on data caches, but also on address caches. The importance of the memory system performance is increasing as the processor cycle times decrease.
A translation lookaside buffer (TLB) is a cache that is used to speed up address translation in a paged virtual memory system. The cache is implemented on-chip to reduce memory access delay. Without a TLB, every instruction or data reference would require additional memory accesses to the page table. The TLB access time becomes more crucial for physically indexed caches, because it is on the critical path of cache accesses.
A TLB is a virtual cache which retrieves a physical address indexed by a virtual address. The paging information is stored in a page table entry (PTE) resident in main memory, and its copy is cached into a TLB entry. Inconsistency between a PTE and TLB entry might occur in uniprocessors when an application invokes a virtual memory operation updating a PTE, e.g., a virtual memory operation issued by a user application for memory allocation, deallocation, attribute modification, etc. A uniprocessor maintains consistency by invalidating or flushing the TLB after updating a PTE, since the uniprocessor knows when inconsistency occurs and only a local TLB is involved.
In a shared memory multiprocessor (SMP) environment, multiple threads can be associated with a single parallel application. These threads run independently on different processors, but they all have to share the same address space. Since these threads share the common page table, the same page table entry can be cached into multiple TLBs. If any of the threads updates such a replicated TRE, it causes an inconsistent state among those TLBs. The problem caused by such inconsistent state is referred to as the TLB consistency problem.
There are a number of existing algorithms in the prior art that are directed to solving the TLB consistency problem in uniprocessor and multiprocessor environments; however, no such algorithms are directed to solving the TLB consistency problem in the face of a virtual machine computing environment having a hypervisor, or a computing environment managed by a virtual machine manager (VMM). Also, some prior art approaches include performing spinlocks for flushing the TLB. However, since the purpose of a tagged TLB is to improve performance, a tagged TLB algorithm is also desired that avoids expensive spinlocks for flushing the TLB. As described below in the various following sections, the invention addresses these and other needs in the art.