The present invention relates generally to the field of computer memory management, and more specifically to techniques for improving the efficiency of transactional memory operations.
Many computer systems employ cache memory to speed data retrieval operations. Cache memory stores copies of data found in frequently used main memory locations. Accessing data from cache memory speeds processing because cache memory can typically be accessed faster than main memory. If requested data is found in cache memory, then it is accessed from cache memory. However, if requested data is not found in cache memory, then the data is first copied into cache memory and then accessed from the cache memory.
Multi-level cache is a structure in which there are multiple cache memories. For example, a computing system may have three levels, i.e. an L1 cache, an L2 cache, and an L3 cache. Typically, in a multi-level cache configuration, L1 is the smallest and with a short access time. If requested data is not found in L1 cache, the system searches the L2 cache, which is usually physically further away than the L1 cache, thus, with a greater access time. In a similar fashion, if the data is not found in the L2 cache, the L3 cache is searched. Main memory is only accessed if the requested data is not in the L1, L2, or L3 caches. There are many different implementations of cache memory.
A transactional memory operation is a type of memory operation that groups one or more load and store operations performed by a processor into a single transaction that is visible to other processors as a single operation when the transaction completes. The effects (e.g., the data) of multiple store operations participating in the single transaction are not made visible to other processors until the transaction is complete. A transactional load is a load that accesses and buffers data until a transaction completes, after which the data is forwarded to the processor that requested it. If the data from a transactional load is changed (i.e., its location in memory is written to) after the load occurs and before the transaction completes, the transaction is aborted. A transactional write is a write whose data is not seen by other processors until the transaction completes. A read from or a write to a location that is the target of a transactional write in a transaction aborts the transaction. Transactional memory is often helpful in synchronizing work that is performed in parallel on multiple CPUs (by enabling atomic operations on an arbitrary set of memory locations) and when multiple writes must not be interrupted, for example writes to control registers in a coprocessor. Since writes that participate in a transaction are not visible until the transaction completes, a read from a location that is going to be written by a write in the transaction aborts the transaction.
To enhance performance, a computer system often includes special hardware, known as accelerators that have structures that are optimized for executing a particular task. Media processors (e.g., video or audio), graphics processing units (GPU), string processing units, vector units, etc. are some examples. A central processor unit (CPU) often sets up an accelerator by writing control data into control registers in the accelerator that tell the accelerator what to, where to go for the data it operates on, and where to write its results, among other things. After its control registers are set up, an accelerator usually performs operations that include direct accesses of main memory for some data, without involving a cache structure that the CPU accesses, operates on the data, often returns results to main memory, and informs the CPU that the task that it was given has been completed (usually via an interrupt). A transactional memory system is usually integrated into a level in a cache structure that is accessed by the CPU. The integration facilitates the process of making the writes in a memory transaction visible to other CPUs in a system all at once if the transaction successfully completes. Because an accelerator usually accesses data directly from memory, the memory locations that it accesses are not usually visible to all levels of the cache structure and therefore not visible to the integrated transactional memory system (to check for address conflicts)—so if an accelerator is accessed by a CPU during a transaction, the transaction is aborted because an undetected address conflict can occur. Aborting transactional memory operations can cause performance to decrease. Techniques to prevent aborting a transactional memory operation are an active area of research.