One area of interest to computer program developers is parallel processing whereby computer code from an application is processed by two or more co-operating processors simultaneously using shared memory. A computer system having two or more co-operating processors coupled to shared memory, an operating system (OS) adapted for parallel processing, such as a multi-tasking, multi-threaded OS, and an application coded in a computer language adapted for parallel processing may provide significant performance advantages over a non-parallel processing implementation of the application.
When programming multiple threads using shared memory, synchronization is often necessary to communicate control commands or data between executing threads. Synchronization may be implemented in a variety of manners including critical sections, barriers, and semaphores. A primitive form of synchronization is the atomic update of a single memory location. In a multi-threaded environment, for example, an atomic update of a shared memory location by one of the threads requires that no other thread can read or modify the shared memory location while the update is happening. Synchronization is used to ensure that two or more threads competing for the same resource (i.e. the shared memory location) wait on the resource until the one thread having the resource is finished.
Often, the lower level instruction sets of many computer processor architectures include specific instructions to atomically update memory in specialized ways. These instructions typically form the basis of other forms of synchronization. Higher level programming languages such as C, C++ or Java include primitives that represent various forms of synchronization. For example, in the OpenMP™ application programming interface (API) extensions to C and C++, there are constructs for critical sections, semaphores, barriers and atomic updates. OpenMP is a trademark of OpenMP Architecture Review Board. These forms of synchronization can be implemented correctly using primitive forms of synchronization as described above but some known implementations require more efficient treatment. OpenMP supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix™ platforms and Windows™ NT platforms.
In accordance with the OpenMP C and C++ API, (Version 2.0, published March 2002, and available at http://www.openmp.org/specs/mp-documents/cspec20_bars.pdf) for example, there is provided an ATOMIC construct, a pragma or compiler directive to instruct a C/C++ compiler to generate code which ensures that a single memory location is updated atomically. If there are instructions from the target processor's instruction set that match the semantics of the atomic update then use of those instructions is often the best implementation of the construct. If, however, there are no appropriate hardware instructions available, other synchronization implementations are used to ensure that the update is indeed atomic. As an example of this problem, the OpenMP implementation of the ATOMIC pragma on PowerPC™ architecture processors is unable to exploit the available load word and reserve index (Iwarx) and store word conditional index (stcwx) instructions for compound atomic updates of data items larger than 4 bytes (e.g. double or long long data types).
A common implementation of these compound atomic operations (i.e. reads and writes to more than one 4-byte word) requires acquiring a semaphore or lock in another location, updating a particular shared memory location and then releasing the lock. Because there is often ambiguity about which symbols in a computer source code language can refer to which memory locations (e.g. through the use of pointers), a correct solution must ensure that acquisition of a lock for an atomic update guarantees that no other atomic update on the same or overlapping locations in memory can happen concurrently.
One way to ensure this exclusivity is to require all atomic updates in a program to acquire a single shared lock. The problem with this solution is that threads are likely to contend for that single shared lock when, in fact, they are not contending to update the same or overlapping locations in memory.
The following pseudo-code illustrates an exemplary contention:
double *p, *q;#pragma atomic*p += x...#pragma atomic*q += y
In the example, the updates of the memory pointed to by p and q must be done exclusively only if p and q point to overlapping storage. If an implementation of the atomic construct uses a single shared lock and p and q do not, in fact, point at overlapping storage, then there may be unnecessary contention due to the shared lock.
A solution to some or all of these shortcomings is therefore desired.