Multi-threaded computer architectures such as those of the Tera MTA and Cray XMT computers provide for large-scale parallel execution of threads in order to hide memory latency and make efficient use of available bandwidth by running other threads when data is locked during thread execution. The Tera MTA and Cray XMT architectures include a programming and execution model that uses a feature known as extended memory semantics (EMS), which includes operations that rely on the use of a full/empty memory tag bit in order to combine reading/writing of data with synchronization.
The full/empty bit feature involves associating a hidden bit with each memory location (e.g., a word) to indicate whether the memory location is in a “full” or “empty” state, thereby facilitating data locking and synchronization. Full/empty bits require non-standard/non-commodity memory components. However, Full/empty bits are sometimes used because they require a relatively low memory footprint for locking/synchronization. For example, a writeEF operation will stall (commonly known as “spinning”) until the bit is empty (e.g., equal to zero), then set the bit to full (e.g., equal to one) after the write is complete. A readFE operation will stall until the bit is full, then set the bit to empty after the read is complete. Thus, only a single operation is required to (i) acquire a lock and read data (e.g., readFE) or (ii) release a lock and write data (e.g., writeEF). In contrast, having an explicit lock operation would increase memory requirements because an additional lock variable would be required, and also reduce bandwidth.
The x86 programming model does not include EMS or full/empty bits in the memory. Without the use of these additional bits (which are not present in standard memory modules), programmers often rely on either an explicit lock or a single bit that is reserved in the user-visible portion of the data. Explicit locks consume additional memory and/or bandwidth, whereas reserving a bit in user-visible memory can be difficult to accomplish for certain data types such as floating-point data.
x86 based computer systems may include a limited set of atomic memory operations (AMOs) for synchronization. Atomic operations are operations that can be performed without interference from any other operations while the operation is in progress. These AMOs include Fetch-and-Add, Compare-and-Swap, and Test-and-Set.