1. Field of the Invention
The present invention relates to a digital data processing apparatus where multiple processing units concurrently access a shared memory. More particularly, the invention uses compile and/or run time scheduling to perform reordering of memory access instructions while emulating a strongly consistent programming model by detecting potential processing unit conflicts in shared memory and avoiding the conflicts by restarting operation of the affected processing unit at a previous state previously archived in a rollback register set.
2. Description of the Related Art
Modern unit processor designs often make extensive use of both compile time and run time instruction reordering. Very long instruction word ("VLIW") architecture provides an especially good example of aggressive instruction reordering for the sake of performance improvement.
A VLIW machine employs a compiler to search through a stream of machine instructions to identify instructions capable of being executed simultaneously. In accordance with this search, the instruction stream is reordered to assemble these instructions into a compound VLIW instruction. Each part of a VLIW instruction may control a separate part of hardware, such as an ALU, a path to main storage, or a path to a register. In one VLIW machine cycle, these separate resources can all be used; as a result, several basic machine instructions can execute concurrently. Thus, each task can be completed in fewer machine cycles than is possible on a traditional uniprocessor. VLIW fine-grained parallelism is said to exist at the machine instruction level within each task. VLIW therefore reduces the "turnaround time" from task initiation to task completion, so that results of the operation are available sooner.
One constant concern with multiprocessing systems is ensuring consistency of memory shared by the multiprocessors. With multiprocessing systems such as VLIW or superscalar designs, program instructions are represented by individual machine instructions such as "LOADs" and "STOREs" which are reordered and performed in parallel. The memory STORE operations, of course, change the contents of memory. However, the order in which these STORE operations are performed does not necessarily reflect the original program order. Thus, some conventions are necessary to determine when each multiprocessor recognizes ("observes") the results of the multiprocessors' STORE operations. Otherwise, a LOAD operation executing too early or too late may load the wrong data from memory.
These conventions are referred to as "consistency paradigms", and a number of different variations exist. "Strong ordering" is one of the most common paradigms for shared memory multiprocessing. Strong ordering dictates that all multiprocessors sharing the same memory will observe STORE operations executed by any specific processor in the order in which the LOADs and STOREs occur in the program source, i.e., in "program order". Therefore, strong ordering does not encumber the programmer, since the hardware strictly observes program order. The LOADs and STOREs executed by distinct processors, however, may be shifted in time to allow any apparent interleaving of memory references among processors through synchronization provided by higher level parallel constructs such as locks.
Some known systems implement the strong ordering paradigm by serializing references to shared memory. Namely, each STORE to shared memory is made visible to subsequent LOADs from any processor in the system before the processor continues on to the next STORE occurring in program order. Each LOAD from shared memory must observe the latest value of the shared memory before any subsequent LOADs in the same instruction stream.
This serialization of memory references reduces the rate at which instructions issue on each processor, thereby restricting multiprocessor performance. The performance degradation due to memory reference serialization adversely affects superscalar designs, and even more acutely affects VLIW designs.
To accelerate shared memory multiprocessing, several programming paradigms with less restrictive semantics are known. These include "firm consistency", "release consistency", and "weak consistency". Each of these less restrictive paradigms allows the programmer to permit visibility of LOADs and STOREs outside of program order. However, these weaker programming paradigms require the programmer to designate instruction boundaries ("barriers") beyond which compilers or hardware cannot migrate LOADs and/or STOREs. Thus, such less restrictive schemes impose additional work on the programmer to ensure correct program execution. Consequently, these approaches are not suitable for some applications, since (1) they require additional work that can be time-consuming for the programmer, and (2) they can be vulnerable to subtle, timing-sensitive correctness errors.
For the foregoing reasons, then, known multiprocessor consistency paradigms are not completely adequate for all applications.