Herein, related art is described to facilitate understanding of the invention. Related art labeled “prior art”, if any, is admitted prior art; related art not labeled “prior art” is not admitted prior art.
Generally, computer programs have a logical order in which their instructions are to be executed to produce the desired result. However, in many cases, greater performance can be achieved by executing instructions out-of-order, e.g., by speculatively executing time-consuming instructions ahead of their logical order so that the results are available earlier. On shared-memory multiprocessor architectures, program performance can be further improved by executing multiple tasks concurrently on multiple processor cores.
Out-of-order execution can work as long as the results are the same as they would have been if the logical order had been adhered to. For example, a program thread might include a load instruction to load a value from a memory location into a processor register. If that load instruction has been advanced, there will be a time interval between the time the load instruction is executed and the time the load instruction would have been executed if the logical order had been adhered to. If, during that interval, another thread has changed the contents of the memory location, the out-of-order load instruction may have loaded the wrong value.
There are two basic approaches to addressing potential errors due to out-of-order instructions that are speculatively executed: 1) preventing them, and 2) detecting and correcting for them. In the latter case, an advanced-load address table (ALAT) or a comparable mechanism can be used to keep track of addresses accessed by an advanced load. In the event of an intervening store operation to the load address, a table entry corresponding to the previously executed advanced-load instruction is marked “invalid”. When a check instruction (e.g., at the logical-order position for the load instruction) detects an invalidated entry, the correct value can be loaded.
The challenge for concurrent execution on shared-memory architectures is to ensure concurrent tasks do not introduce data races by modifying a shared memory location concurrently. To prevent such data races from producing incorrect results, some processor architectures permit sections of a program to be executed “atomically”. More specifically, a section of code that would otherwise be vulnerable to interference by another thread can be given exclusive access to some memory locations until execution of the section is completed. For example, if two threads store to that same location, the section of code in each of the threads trying to access the memory location should be contained within atomic sections; this ensures that each thread has exclusive access when it tries to store to that memory location.
Parallel programs are written using a high-level or low-level programming language with atomic sections to prevent such data races. Alternatively, they can also be generated automatically using a parallelizing compiler for instance. Atomic sections are treated as black boxes by the compiler from a code reordering perspective. In general, instructions inside the atomic section which access shared data cannot be moved outside the atomic section without introducing data races.