In a parallel processor, many concurrently executing threads of instructions may be reading and writing to the same memory location independently, that is, without any coordination with one another. These reads and writes may be performed using traditional load and store instructions. However in a parallel execution environment such updates to a region of memory can be problematic. For example, a programmer may need to design the program of instructions in order to ensure, while one thread of instructions is updating the region of memory, the region of memory is not being modified by another thread. A typical update of a memory location involves a load, an update that depends on the load value, and a store to the memory location. During these steps, another thread could perform a store to the same memory location, e.g. as part of its own multi-step update, thereby corrupting a value-dependent update.
The typical solution to this problem is to carefully design the program such that memory regions which are shared between threads are never accessed simultaneously. This is often done programmatically with semaphore objects to “lock” a region of memory or code so that multiple threads cannot simultaneously touch the same region of memory or execute the locked region access code. Only when one thread is done updating the region of memory does it “unlock” that region, so that another thread can take over control of the region. Such traditional approaches involving separate instructions dedicated to the locking, loading, updating, storing, and unlocking of memory locations require significant time to execute, and serialize the parallel execution to one thread at each lock/unlock point, reducing the performance benefit of parallel processing.
Accordingly, there exists a substantial need for achieving efficient memory updates within parallel computing environments that allow multiple threads of instructions to update the same region of memory with minimal conflict.