The TSO (Total Store Ordering) is one type of memory ordering. The TSO has limitations for changing writing order in which data is actually written into a memory with respect to order of instructions which access the memory. There are three TSO rules:
1. A load instruction can not pass a preceding load instruction, which precedes the load instruction, to process the load instruction;
2. A store instruction can not pass a preceding load instruction and the preceding store instruction, both of which precedes the store instruction, to process the store instruction; and
3. An atomic load store instruction can not pass a preceding load instruction and the preceding store instruction, both of which precedes the atomic load store instruction, to process the atomic load store instruction. When a target data is loaded by the atomic load store instruction, an access to the target data from another instruction is restricted from a time the target data is loaded until a time the target data is stored by the atomic load store instruction.
FIG. 9 is a diagram illustrating a passing and no passing pattern between load instructions and between store instructions, which is to be assured by the TSO. As illustrated in #1, the preceding load instruction can not be passed to process the load instruction. As illustrated in #2, the preceding load instruction can not be passed to process the store instruction. As illustrated in #3, the preceding store instruction can be passed to process the load instruction. As illustrated in #4, the preceding store instruction can not be passed to process the store instruction.
In other words, the preceding store instruction can be passed to process the load instruction, but other passing patterns are inhibited. However, when a target data of the load instruction is included in a target data of the preceding store instruction, it is necessary for the load instruction to load the data of the preceding store instruction.
Here, in a processor which processes a memory access out of order, the preceding load instruction can be passed to process the subsequent load instruction, and the data can be returned before the processing of the preceding load instruction to an instruction control unit as follow.
FIG. 10 is a diagram for describing an example of the passing between the load instructions in the processor which executes an out-of-order processing. In the processor which executes the out-of-order processing, an instruction control unit 100 issues an load instruction load-A to a cache control unit 200. When a cache miss occurs on a target data of the load instruction load-A, the cache control unit 200 requests the corresponding data to an external storage means.
Next, the instruction control unit 100 issues a load instruction load-B to the cache control unit 200. When a cache hit occurs on a target data of the load instruction load-B, the cache control unit 200 passes a processing of the preceding load instruction load-A, executes a processing of the subsequent load instruction load-B, and returns data-B, which is the target data of the load instruction load-B, to the instruction control unit 100.
After that, when receiving transferred data from the external storage means, the cache control unit 200 executes the processing of the load instruction load-A, and returns data-A, which is the target data of the load instruction load-A, to the instruction control unit 100.
As described above, in the out-of-order processing, data is not always read from a cache in order of the instructions. However, the instructions are actually executed by the instruction control unit 100 in order of the instructions. Thus, in a processor which supports only single thread, even when the data is read out of order, as long as the TSO between the load/store instructions is observed, it seems from software that the TSO is observed.
However, in an SMT (Simultaneous Multi Thread) processor, a plurality of threads which are simultaneously executed share a primary cache. Then, it becomes necessary to avoid a TSO infringement between the threads in single processor.
The SMT is technique for simultaneously executing the plurality of threads on a single CPU. In other words, the SMT processor is the processor provided with a function for simultaneously executing the plurality of threads. There is a Patent Document 1 which describes the prior art for the SMT processor. The Patent Document 1 describes that consistency of execution order is assured for the reading and the writing of shared data between the threads.    Patent Document 1: WO2004/068361