High-performance processors adopt out-of-order processing to execute instructions while maintaining a high degree of parallelism. Out-of-order processing refers to processing of executing, while the reading of data of an instruction is delayed for some reason such as cache miss, the reading of data of a subsequent instruction, and thereafter executing the delayed reading of data of the instruction.
However, the above-described processing may cause a situation in which the latest data is extracted by the later executed reading of preceding data and old data is read by the previously executed reading of subsequent data, and thus may result in the violation of Total Store Ordering (TSO). The TSO indicates that the data reading result correctly reflects the data writing order, and is called consistency of execution order. The TSO is a type of memory ordering which specifies the constraints on the reordering of the order of actually writing data into a memory with respect to the order of instructions performing memory access. The TSO includes the following three rules.
(1) A load instruction must not be processed before a preceding load instruction.
(2) A store instruction must not be processed before a preceding load or store instruction.
(3) An atomic load/store instruction must not be processed before a preceding load or store instruction.
While a load instruction is allowed to be processed before a preceding store instruction, the other patterns of forwarding are disallowed as illustrated in FIG. 15. If a target data of the preceding store instruction contains the target data of the load instruction, the load instruction naturally requires to load the target data of the store instruction.
In a processor (CPU or CPU-CORE) that processes an out-of-order memory access, a load instruction is allowed to return data to an instruction control unit before a preceding load instruction. As illustrated in FIG. 16, the following processing is performed in an information processing device which includes an instruction control unit 100 and a primary cache control unit 200 for accessing a cache memory in accordance with a memory access request from the instruction control unit 100. For example, if the target data of a preceding load instruction (load-A) is cache-missed and the target data of a subsequent load instruction (load-B) is cache-hit, the primary cache control unit 200 returns the instruction control unit 100 the data of the subsequent load instruction (load-B) before the data of the preceding load instruction (load-A).
However, actual instruction execution is performed in accordance with the order of instructions. Therefore, even if load data are read in random order, software operating in an information processing device having a single processor configuration recognizes that the TSO is observed, as long as the TSO between load and store instructions is observed. In an information processing device having a multiprocessor configuration, however, software recognizes the TSO violation between load instructions in some cases.
FIGS. 17A to 17C illustrate a case in which store instructions store-A and store-B are issued in this order in a processor CPU-α to the data in areas A and B of the corresponding cache memory 212, and in which load instructions load-B and load-A are issued in this order in a processor CPU-β to the data in areas A and B of the corresponding cache memory 212.
In FIGS. 17A to 17C, an FP (Fetch Port) 210 is an instruction port for holding instructions accessing the data in the cache memory 212. The FP 210 holds the instructions in respective entries thereof identified by FP numbers 0 and 1. Further, each of the CPU-α and the CPU-β of an information processing device illustrated in FIGS. 17A to 17C includes an instruction control unit 100 and a primary cache control unit 200. Further, the CPU-α and the CPU-β share a secondary cache control unit 300 provided in a lower layer. In the following description, to identify the instruction control unit 100, the primary cache control unit 200, the FP 210, and the cache memory 212 of the CPU-α, the respective reference numerals will be attached with “a” and thus represented as 100a, 200a, 210a, and 212a, respectively. Similarly, to identify the instruction control unit 100, the primary cache control unit 200, the FP 210, and the cache memory 212 of the CPU-β, the respective reference numerals will be attached with “b” and thus represented as 100b, 200b, 210b, and 212b, respectively. If there is no need to identify the CPU-α or the CPU-β, the reference numerals 100, 200, 210, and 212 will simply be used.
As illustrated in FIG. 17A, in the primary cache control unit 200a of the CPU-α, the store instructions store-A and store-B are both cache-missed, and data requests are sent to the secondary cache control unit 300. Further, in the primary cache control unit 200b of the CPU-β, while the load instruction load-B is cache-missed and fails to return the target data to the instruction control unit 100b, the load instruction load-A is cache-hit and returns the target data data-A (old) to the instruction control unit 100b before the load instruction load-B.
Then, as illustrated in FIG. 17B, the process target instruction of the primary cache control unit 200a in the CPU-α is the store instruction store-A. Therefore, an invalidation request to the cache memory 212b of the CPU-β is issued via the secondary cache control unit 300, and thereafter two store instructions store-A and store-B are processed in this order.
Thereafter, as illustrated in FIG. 17C, the data in the area B subjected to the storage process is transferred from the CPU-α to the CPU-β. Then, the load instruction load-B is processed in the CPU-β, and the data data-B (new) is returned to the instruction control unit 100b. 
As a result, two store instructions store-A and store-B have been issued in this order in the CPU-α. In the CPU-β, however, while the load instruction load-B has returned the data data-B (new) subjected to the storage process, the load instruction load-A has returned the data data-A (old) not subjected to the storage process. The instruction processing in the CPU-β has violated the prohibition by the TSO of forwarding of a load instruction before a preceding load instruction.
To prevent the violation, a subsequent load instruction is re-executed when there is a possibility of violation of the TSO. If there is a subsequent load instruction which has returned data before a preceding load instruction, and if the target data of the subsequent load instruction has been invalidated to allow another processor to use the data, the information indicating the above situation is stored in the corresponding processor. Then, when the preceding load instruction reads data, the possibility of TSO violation is notified to the instruction control unit 100, and instruction re-execution is performed starting from the next instruction, i.e., the subsequent load instruction.
For example, if the load instruction load-A is processed before the load instruction load-B in the CPU-β and thereafter the invalidation process on the data in the area A is requested, a flag indicating the above situation is validated. Thereafter, if the flag is valid when the load instruction load-B is processed and the corresponding data is returned, the instruction control unit 100b is notified of the possibility of TSO violation between the load instructions. Then, the instruction control unit 100b reissues the load instruction load-A. Therefore, the data returned thereafter by the load instruction load-A reflects the store instruction store-A of the CPU-α, and thus the TSO violation is prevented.
Japanese Patent No. 4180569 discloses a technique relating to cache memory control.
According to the method of preventing the TSO violation by re-executing a load instruction with the use of a flag, however, unnecessary instruction re-execution is requested. Therefore, the unnecessary instruction re-execution may cause a degradation of the processing performance, although the re-execution does not cause data corruption.
Such degradation of the processing performance is predicted to occur regularly, depending on a change in configuration or control of the secondary cache control unit caused by the multi-core configuration. In that case, the unnecessary instruction re-execution occurs regularly, and thus may cause substantially serious degradation of the performance in an information processing device which processes memory access in an out-of-order fashion.