In a processor serving as an arithmetic processing apparatus, which is currently commonly used, an out-of-order process is employed in order to maintain the consistency at a degree higher than that in the related art to run an instruction. The out-of-order process refers to a process of executing, while reading of data for a preceding instruction is delayed due to a cache miss or the like, reading of data for a subsequent instruction, and then executing reading of the data for the preceding instruction.
However, if this process is performed, there may be a case in which the latest data is read in the reading executed by the subsequent instruction and old data is read in the reading executed by the preceding instruction, and thus may result in the violation of total store ordering (TSO).
Here, the TSO indicates that a reading result of data correctly reflects a data writing order and secures consistency of an execution order. The TSO is one of memory ordering rules which specifies the constraints in replacement of an order of data to be actually written in a memory with respect to an order of instructions accessing the memory. The TSO rule includes the following three. A load instruction may not be processed so as to bypass a preceding load instruction; a store instruction may not be processed so as to bypass a preceding load instruction and a preceding store instruction; and an atomic load/store instruction may not be processed so as to bypass a preceding load instruction and a preceding store instruction.
That is to say, as illustrated in FIG. 1, a load instruction (load) may be processed so as to bypass a preceding store instruction (store) but in the other patterns it is inhibited from bypassing an instruction. However, in a case where target data of a load instruction is included in target data of a preceding store instruction, the corresponding load instruction loads the data of the store instruction.
Here, a processor (a CPU or a CPU-CORE) which processes memory access out of order enables a load instruction to return data to an instruction control unit before a preceding load instruction is executed. As illustrated in FIG. 2, in an arithmetic processing apparatus including an instruction control unit 100, and a primary cache control unit 200 which accesses a cache memory in response to a memory access request from the instruction control unit 100, the subsequent process is performed. For example, in a case where cache miss occurs in relation to target data of a preceding load instruction (load-A) and cache hit occurs in relation to target data of a subsequent load instruction (load-B), the primary cache control unit 200 returns the data of the subsequent load instruction (load-B) to the instruction control unit 100 so as to bypass the preceding load instruction (load-A).
However, actual instruction execution is performed according to an order of instructions. Therefore, when TSO between a load instruction and a store instruction is kept, even if load data is read out of order, it seems that TSO can be kept in software of an arithmetic processing apparatus with a single processor configuration. However, when another process invalidates target data of a subsequent load instruction in response to a store instruction in an arithmetic processing apparatus with a multi-processor configuration, there are cases where TSO violation between load instructions is found in software. In other words, there are cases where read data of the preceding load instruction become new data after execution of the store instruction regardless of read data of the subsequent load instruction being old data before execution of the store instruction and thus bypassing inhibition between load instructions of TSO is violated.
In order to avoid this, a subsequent load instruction may be re-executed in a case where there is a possibility that TSO may be violated. That is to say, when there is a subsequent load instruction which bypasses a preceding load instruction and returns data, the target data is invalidated such that another processor uses target data of the subsequent load instruction, and thus a processor of interest stores the target data being invalidated. In addition, when the preceding load instruction makes data read, the instruction control unit 100 may be notified that there is a possibility that TSO may be violated, and instructions after the next instruction (the subsequent load instruction) may be re-executed.
However, if whether or not there is a possibility of TSO violation is determined based on whether or not there is an invalidation request, there are cases where it is determined that there is a possibility of TSO violation even though TSO is not inherently violated. If this determination is performed, an instruction re-execution process is unnecessarily performed, which is thus a factor of considerably reducing a performance.
Japanese Patent No. 4180569, Japanese Laid-open Patent Publication Nos. 2011-134205 and 6-214875 are examples of the related art.