The present invention relates generally to the field of computer memory management, and more specifically to techniques for improving the efficiency of transactional memory operations.
Many computer systems employ cache memory to speed data retrieval operations. Cache memory stores copies of data found in frequently used main memory locations. Accessing data from cache memory speeds processing because cache memory can typically be accessed faster than main memory. If requested data is found in cache memory, then it is accessed from cache memory. However, if requested data is not found in cache memory, then the data is first copied into cache memory and then accessed from the cache memory.
Multi-level cache is a structure in which there are multiple cache memories. For example, a computing system may have three levels, i.e. an L1 cache, an L2 cache, and an L3 cache. Typically, in a multi-level cache configuration, L1 is the smallest and with a short access time. If requested data is not found in L1 cache, the system searches the L2 cache, which is usually than L1 cache and physically further away than the L1 cache, thus, with a greater access time. In a similar fashion, if the data is not found in the L2 cache, the L3 cache is searched. Main memory is only accessed if the requested data is not in the L1, L2, or L3 caches. There are many different implementations of cache memory.
To improve performance, a computer architecture often includes prefetch instructions that can move data from main memory or a lower cache level to a higher cache level (closer to the processor) in anticipation of an access to the data. The execution of a prefetch instruction can be speculative in nature in that it may be performed before it is known that the data moved by the prefetch instruction will actually be accessed. Data in anticipation of either a read access or a write access may be prefetched, which is called a read prefetch or a write prefetch respectively. Cache coherency protocols in computers, that enable the sharing, synchronization, and parallel processing of data, often require a processor to gain ownership of the data before it can be written to (i.e., changed). Because the data may be currently owned by another processor, gaining ownership can be a lengthy process. The concept of ownership is necessary to prevent data that is being read by one processor from a memory location from being changed by another processor that writes to the same location or a different location (if copies of the data are in multiple different locations). Therefore it is often advantageous for a processor to execute a write prefetch instruction to gain ownership of the data in anticipation of writing to the data so that processing is not delayed while ownership is acquired.
Since the access time of a cache is often critical to the performance of a code that is executing, and a cache is often busy with many operations (e.g., servicing misses), it is beneficial to decrease a cache's workload, if possible. One common technique used to decrease a cache's workload includes accumulating multiple stores that store into to a common cache line in a cache line buffer, and then storing the contents of the cache line buffer into a cache as a single operation. This decreases a cache's workload and improves its response time and, thus, potentially improves the performance of a code that is executing. Such a technique is commonly performed in a mechanism called a store cache.
Transactional memory is a type of memory operation that groups one or more load and store operations performed by a processor into a single transaction that is visible to other processors as a single operation when the transaction completes. The effects (e.g., the data) of multiple store operations participating in the single transaction are not made visible to other processors until the transaction is complete. A transactional load is a load that accesses and buffers data until a transaction completes, after which the data is forwarded to the processor that requested it. If the data from a transactional load is changed (i.e., its location in memory written to) after the load occurs and before the transaction completes, the transaction is aborted. A transactional write is a write whose data is not seen until the transaction completes. A read from or a write to a location that is the target of a transactional write in a transaction aborts the transaction. Transactional memory is often helpful in synchronizing work that is performed in parallel on multiple CPUs (by enabling atomic operations on an arbitrary set of memory locations) and when multiple writes must not be interrupted. Since writes that participate in a transaction are not visible until the transaction completes, a read from a location that is going to be written by a write in the transaction aborts the transaction.