1. Field of the Invention
The present invention relates to a load/store instruction control circuit of a microprocessor which has a no-write allocate area writing directly a store-data by a store instruction into an external memory or a cache memory of a lower layer when a store instruction initiates a cache-miss, and which implements a non-blocking cache not allowing processes of a pipeline to stop even in case of a cache-miss.
2. Related Background Art
Because it takes a long time that a processor accesses a main memory (an external memory), a cache memory being able to access at a high speed is often provided between the processor and the external memory. A portion of data, which has been stored or should be stored to the external memory, is copied to the cache memory.
An LSU (Load Store Unit) in the processor detects whether or not a corresponding data exists in a cache memory when it executes a load instruction. If the corresponding data exists in the cache memory, the LSU reads out the data from the cache memory. When the LSU executes a store instruction, it stores a store-data to the cache memory instead of the external memory.
FIG. 1 is a diagram showing an operation of the LSU in the conventional processor. When the load instruction is issued, the LSU detects whether or not data to be loaded is stored to a data cache (DCACHE) 101. If the data has been stored to the DCACHE 101, the LSU reads out the data in order to store to a register 102. If the load data is not yet stored to the DCACHE 101, the LSU reads out the data from the external memory. If the store instruction is issued, the LSU stores a value of the register 102 to the DCACHE 101.
There is sometimes the case that a no-write allocate area (space) is provided in a memory space of the main memory of the processor. When a cache-miss of the load instruction occurs in the no-write allocate area, a refill process for the DCACHE is carried out. When a cache-miss by the store instruction occurs in the no-write allocate area, the refill process for the DCACHE is not carried out. The data of the store miss is sent only to the external memory. That is, in such a case, the DCACHE is not updated.
On the other hand, as a manner accessing to the DCACHE, a non-blocking manner is known. The non-blocking has a feature in which the pipeline of the processor does not stop even if the cache-miss occurs.
When the cache access for the above-mentioned no-write allocate area is conducted by the non-blocking cache manner, the following problem may occur.
FIG. 2 is a conventional data flow in case conducting the cache accesses in succession by the non-blocking manner for the same line in the no-write allocate area. FIG. 2 shows an example that (1) one word data corresponding to an address generated by adding zero to a content of a general purpose register of an entry number 5 is loaded to a general purpose register of an entry number 2, (2) one word data of a general purpose register of an entry number 3 is stored to an address generated by adding four to a content of the general purpose register of the entry number 5, (3) one byte data corresponding to an address generated by adding five to a content of the general purpose register of the entry number 5 is loaded to the general purpose register of the entry number 4.
When the lw (load word) instruction of (1) misses on the cache in the no-write allocate area, a refill process of a corresponding cache line is carried out. However, in case of the non-blocking cache, the pipeline does not stop.
Next, the sw (store word) instruction of (2) accesses the same cache line of the DCACHE 101 as (1). At this time, because a word data being a target of the sw is not yet refilled to the DCACHE 101, the cache-miss occurs. Accordingly, data to be stored is sent to the external memory 103 without being stored to the DCACHE 101. The lw miss data of (1) is refilled to the corresponding cache line.
When executing the lb (load byte) instruction of the next (3), if the refill process initiated by the lw instruction of (1) has finished, the lb instruction of (3) hits on the cache. However, because the store-data by the store instruction of (2) is not yet stored to the corresponding cache line, the lb instruction of the (3) reads out an old data.
On the other hand, FIG. 3 is a conventional data flow which dissolves an undesirable problem of FIG. 2.
In FIG. 3, when a lw instruction of (1) initiates the cache-miss, the refill process of the corresponding cache line is carried out. In case of the non-blocking cache, the pipeline does not stop. A sw instruction of (2) accesses the same cache line as (1). At this time, because a word data being a target of the sw is not yet stored to the DCACHE 101, the cache-miss occurs.
After then, data of one cache line is loaded to the DCACHE 101 by the refill process by the lw instruction of (1). Before a lb instruction of (3) is executed, some kind or another means occupies the DCACHE 101 in order to invalidate the cache line refilled by (1). Because the cache line is invalidated, the lb instruction of (3) initiates the cache-miss. Therefore, the refill process is carried out in order to read out a latest data that the sw instruction of (2) has stored to the memory.
Thus, according to the method of FIG. 3, it is possible to read out the latest data that the immediately preceding store instruction has stored to the memory. However, according to the method of FIG. 3, when a plurality of cache-misses occur sequentially, a complicated control is necessary to avoid a stop of the pipeline. That is, when the cache-misses for the same cache line in the no-write allocate area occurs sequentially, the control which assures an order of bus read/write becomes complicated. Because of this, structure of the processor becomes complicated, and it takes a long time to verify the operation of the processor. Furthermore, because the number of gate stages on paths in the inside of the processor increases with the complexity, there is a likelihood that a control block of the load/store instruction becomes a critical path.
An object of the present invention is to provide a load/store instruction control circuit and a load/store instruction control method being able to assure a consistency between data of a cache memory and data of an external memory, when load/store instructions for a no-write allocate area in a non-blocking cache manner conflict with each other.
In order to achieve the foregoing object, a load/store instruction control circuit of a microprocessor which is able to access a cache memory storing a portion of data stored to an external memory or read out from the external memory, comprising:
load/store same line miss detecting means for detecting that a load instruction for a no-write allocate area storing data directly to a lower layer memory in a cache hierarchy initiates a cache-miss, and a subsequent store instruction initiates the cache-miss for the same cache line as the preceding load instruction, when a non-blocking cache that a pipeline process of a microprocessor does not stop even if the cache-miss of load/store instructions occurs is implemented;
temporary storing means for temporarily storing a store-data by said subsequent store instruction when the same cache line miss by said load/store same line miss detecting means is detected, and
load/store control means for storing to a corresponding cache line the store-data stored to said temporary storing means during a refill process for the cache line by the preceding load instruction or after the refill process.
According to the present invention, when a load instruction for the no-write allocate area initiates a cache-miss, and a subsequent store instruction initiates the cache-miss for the same cache line as a preceding load instruction, during a refill process to the cache line by the preceding load instruction or after the refill process has finished, the store-data by the store instruction is stored to the corresponding cache line, even if the store instruction targets the no-writer allocate area. Because of this, it is possible to reliably assure a consistency of data between the cache memory and the external memory.
Furthermore, when the dirty write back conflicts with the write process to the external memory of the store-data, because the dirty write back is executed on a higher priority than the write process to the external memory, a problem that new data written to the external memory is overwritten with the old data by the dirty write back is dissolved.