The present invention relates generally to data processing and, in particular, to storage accesses to the shared distributed memory system of a data processing system. Still more particularly, the present invention relates to ensuring causality of transactional storage accesses interacting with non-transactional storage accesses.
A conventional multiprocessor (MP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Cache memories are commonly utilized to temporarily buffer memory blocks that might be accessed by a processor in order to speed up processing by reducing access latency introduced by having to load needed data and instructions from system memory. In some MP systems, the cache hierarchy includes at least two levels. The level one (L1) or upper-level cache is usually a private cache associated with a particular processor core and cannot be accessed by other cores in an MP system. Typically, in response to a memory access instruction such as a load or store instruction, the processor core first accesses the directory of the upper-level cache. If the requested memory block is not found in the upper-level cache, the processor core then accesses lower-level caches (e.g., level two (L2) or level three (L3) caches) or system memory for the requested memory block. The lowest level cache (e.g., L3 cache) is often shared among several processor cores.
In such systems, multiprocessor software concurrently accesses shared data structures from multiple software threads. When concurrently accessing shared data it is typically necessary to prevent so-called “unconstrained races” or “conflicts”. A conflict occurs between two memory accesses when they are to the same memory location and at least one of them is a write and there is no means to ensure the ordering in which those accesses occur.
Multiprocessor software typically utilizes lock variables to coordinate the concurrent reading and modifying of locations in memory in an orderly conflict-free fashion. A lock variable is a location in memory that is read and then set to a certain value, possibly based on the value read, in an atomic fashion. The read-modify-write operation on a lock variable is often accomplished utilizing an atomic-read-modify-write (ARMW) instruction or by a sequence of instructions that provide the same effect as a single instruction that atomically reads and modifies the lock variable.
In this manner, a software thread reading an initial “unlocked” value via an ARMW instruction is said to have “acquired” the lock and will, until it releases the lock, be the only software thread that holds the lock. The thread holding the lock may safely update the shared memory locations protected by the lock without conflict with other threads because the other threads cannot obtain the lock until the current thread releases the lock. When the shared locations have been read and/or modified appropriately, the thread holding the lock releases the lock (e.g., by writing the lock variable to the “unlocked” value) to allow other threads to access the shared locations in storage.
While locking coordinates competing threads' accesses to shared data, locking suffers from a number of well known shortcomings. These include, among others, (1) the possibility of deadlock when a given thread holds more than one lock and prevents the forward progress of other threads and (2) the performance cost of lock acquisition when the lock may not have been strictly necessary because no conflicting accesses would have occurred to the shared data.
To overcome these limitations, the notion of transactional memory can be employed. In transactional memory, a set of load and/or store instructions are treated as a “transaction.” A transaction succeeds when the constituent load and store operations can occur atomically without a conflict with another thread. The transaction fails in the presence of a conflict with another thread and can then be re-attempted. If a transaction continues to fail, software may fall back to using locking to ensure the orderly access of shared data.
To support transactional memory, the underlying hardware tracks the storage locations involved in the transaction—the transaction footprint—as the transaction executes for conflicts. If a conflict occurs in the transaction footprint, the transaction is aborted and possibly restarted. Use of transactional memory reduces the possibility of deadlock due to a thread holding multiple locks because, in the typical case, no locks are held (the transaction simply attempts to make one or more storage accesses and restarts if a conflict occurs). Further, the processing overhead of acquiring a lock is generally avoided.