Virtual memory allows a processor to address a memory space that is larger than physical memory. The translation between physical memory and virtual memory is typically performed using page tables. Often several page table levels are employed, where each page table level helps in translating a part of the virtual address. For instructions that access memory, virtual addresses need to be translated to physical memory using the page tables. A Translation Lookaside Buffer (TLB) cache is common in processors to facilitate translations between the physical and virtual memory. TLBs are populated via different mechanisms, for example, for AMD64 type processor architecture, the processor employs a page-table walker that establishes required translations and fills the TLB.
If page tables change and a re-walk is desired, TLBs often need to be flushed to trigger a new page-table walk operation. This operation is usually desired when the operating system determines that the TLB entries should be filled again. The “walk” refers to the process of going through (i.e., walking) the page-table to establish a virtual to physical mapping. A page-table walker performs the page-table walk operation. The page-table walker sets ACCESSED/DIRTY bits depending on the access type (load/store) upon first access. Generally, the processor does not clear these bits. The operating system (OS) can use these bits to determine which memory pages have been accessed and how they have been accessed. Often, these bits need to be cleared in page tables to force a store to the page table entry (PTE) upon page table walks on other processors. Additionally, remote TLB shoot down can be used to remove translations from remote TLBs for which a re-walk is desired.
Shared-memory computer systems (e.g., computer systems that include multiple processors) allow multiple concurrent threads of execution to access shared memory locations. Unfortunately, writing correct multi-threaded programs is difficult due to the complexities of coordinating concurrent memory access. One approach to concurrency control between multiple threads of execution is transactional memory. In a transactional memory programming model, a programmer may designate a section of code (e.g., an execution path or a set of program instructions) as a “transaction,” which a transactional memory system should execute atomically with respect to other threads of execution. For example, if the transaction includes two memory store operations, then the transactional memory system ensures that all other threads may only observe either the cumulative effects of both memory operations or of neither, but not the effects of only one.
Various transactional memory systems have been proposed, including those implemented by software, by hardware, or by a combination thereof. However, many traditional implementations are bound by various limitations. For example, hardware-based transactional memory (HTM) proposals sometimes impose limitations on the size of transactions supported (i.e., maximum number of speculative memory operations that can be executed before the transaction is committed). Often, this may be a product of limited hardware resources, such as the size of one or more speculative data buffers used to buffer speculative data during transactional execution.
One example of a transactional memory system is the Advanced Synchronization Facility (ASF) proposed by Advanced Micro Devices (AMD). The ASF allows user and system level code to modify a set of memory objects atomically without requiring expensive synchronization mechanisms. Unfortunately, in transactional memory systems such as the ASF, tracking large read sets requires large amounts of hardware resources.