The present invention relates to the field of multiprocessing systems, and more particularly to a program that verifies cumulative ordering of memory instructions.
A multiprocessor data processing system may include a plurality of processors and a shared main memory, where each processor includes its own cache comprising a plurality of cache lines. Each of the plurality of processors may be synchronized, commonly referred to as interprocessor synchronization, in a shared memory system so that memory instructions in different cache lines maintain an order.
For example, in Table 1 below, one of the processors in the multiprocessor system, e.g., P1, may update data values and subsequently set a flag variable to indicate to another processor in the multiprocessor system, e.g., P2, that the data value has been updated. Processor P2 may check the value of the flag variable and, if set, subsequently issues read operations to load the new data values. If processor P1 sets the flag before it updated the data or if processor P2 retrieves the data prior to checking the value of the flag, synchronization is not achieved.
Synchronization may be maintained through a special memory instruction commonly referred to as a memory barrier instruction which is issued by processors in the multiprocessor system. A memory barrier instruction, MB, indicates that all memory instructions prior to MB, i.e., pre-MB instructions, are ordered before all memory instructions after the MB, i.e., post-MB instructions. However, no order is required between memory instructions that are not separated by a MB instruction. For example, in Table 2 below,
memory instructions may instruct the processor to store data at different memory addresses which may be different cache lines in the particular processor. Store memory instructions 1 and 2 may be executed by processor P1 to store data 1 and data 2, respectively, at memory address 1 and 2, respectively. Store memory instruction 3 may be executed by processor P1 to store the value of the flag variable at memory address 3. Since processor P1 had issued a memory barrier instruction, processor P1 must execute store memory instructions 1 and 2 prior to the execution of store memory instruction 3, which is commonly referred to as strong ordering, though store memory instructions 1 and 2 may be executed in either order which is commonly referred to as weak ordering. That is, weak ordering refers to memory instructions that may be executed in either order since they do not reference the same cache line. Strong ordering refers to memory instructions that must be executed in order since they reference the same cache line. Processor P2 may then execute load memory instruction 1 to load the value of the flag variable from memory address 3. Processor P2 may then execute load memory instructions 2 and 3 to load data 1 and data 2 from memory address 1 and 2, respectively. Since processor P2 had issued a memory barrier instruction, processor P2 must execute load memory instruction 1 prior to the execution of load memory instructions 2 and 3 though load memory instructions 1 and 2 may be executed in either order because they do not reference the same cache line. When processor P2 executes load memory instruction 1, processor P2 must be able to identify the data values at memory address 1 and 2 in subsequent loads after the memory barrier instruction. This is commonly referred to as cumulative ordering.
A prior art technique in verifying cumulative ordering includes identifying all pairs of storage accesses, e.g., store memory instructions 1-3, on each side of the memory barrier. For example, store memory instruction 3 is paired with both store memory instruction 1 and store memory instruction 2. The data stored in the storage accesses that are executed after the memory barrier instruction, e.g., store memory instruction 3, will later be loaded by a different device, e.g., processor P2. Upon loading that data, the prior art technique verifies that the data read by one processor, e.g., processor P2, from executing a load memory instruction, e.g., load memory instruction 1, before the memory barrier instruction issued by that processor, e.g., processor P2, is the same data that was stored by another processor, e.g., processor P1, from executing a store memory instruction, e.g., store memory instruction 3, after the memory barrier instruction issued by that processor, e.g., processor P1. A further verification is made by comparing the data read by one processor, e.g., processor P2, from executing a load memory instruction, e.g., load memory instruction 2, after the memory barrier instruction issued by that processor, e.g., processor P2, is the same data that was stored by another processor, e.g., processor P1, from executing a store memory instruction, e.g., store memory instruction 1, before the memory barrier instruction issued by that processor, e.g., processor P1. Unfortunately, the prior art technique is very inefficient in that it must make pair-wise comparisons of all loads and stores on each side of each memory barrier instruction.
It would therefore be desirable to verify cumulative ordering without verifying that the data read from executing load memory instructions before/after the memory barrier instruction by one device is the same data that was stored after/before the memory barrier instruction by another device.
The problems outlined above may at least in part be solved in some embodiments by first selecting a memory barrier instruction issued by a particular processor. A first cache line out of a plurality of cache lines may then be selected to be paired with one or more of the remaining of the plurality of cache lines. If a load memory instruction executed after the memory barrier instruction in the first cache line was identified, then the first cache line selected will be paired with a second cache line. If a load memory instruction executed before the memory barrier instruction in the second cache line was identified, then a pair of load memory instructions has been identified. The pair of load memory instructions comprises the first load memory instruction executed after the memory barrier instruction in the first cache line and the second load memory instruction executed before the memory barrier instruction in the second cache line. Upon identifying the second load memory instruction, a first and second reload of the first and second cache lines are identified. A reload may be a system bus transaction that causes a cache line of a particular cache of a particular processor to be updated. Upon identifying the first and second reloads of the first and second cache lines, a determination may be made as to whether the first reload occurred after the second. If the first reload did not occur after the second reload, then a determination may be made as to whether the ownership transaction referencing the first cache line was initiated between the first and second reload. The ownership transaction may refer to a processor procuring control of a cache line to write data to that particular cache line upon obtaining permission from other devices, e.g., processors, in a multiprocessor data processing system. If the ownership transaction was initiated between the first and second reload, then a potential violation of cumulative ordering has been identified.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.