1. Field of the Invention
The present invention relates to a control circuit for controlling a cache system, and more particularly to a cache system control circuit having a store queue for temporary storing a store instruction and being capable of re-ordering the instructions.
2. Description of the Related Art
A semiconductor device may include a data cache or a data cache system and a store queue serving as a write buffer or a store buffer for data-write instruction or data store instruction. Data write operation to a main memory and data cache operation to a data memory may be made, wherein a store instruction including a write address and data is once held by the store queue for improvement in throughput of the processor. Those conventional techniques are disclosed in Japanese laid-open patent publications Nos. 9-114734 entitled “store buffer device”, and also in Japanese laid-open patent publications Nos. 2000-181780 entitled “store buffer device”. The word “data cache system” is defined to be a data cache system which comprises a tag memory and a data memory.
The semiconductor device using the store queue may perform an instruction re-order which changes the original order or sequence of plural instructions. One example of the instruction re-order is that a tag-retrieved store instruction is stored in the store queue for executing a subsequent load instruction for reading data from the data memory or the main memory prior to storing the store instruction to the memory, thereby improving the efficiency of accesses to the data memory and the main memory.
It is, however, essential for the instruction re-order to keep or ensure the dependency relationship of data which are accessed. It is assumed that the original instruction order that is a store instruction to an address is executed, before a load instruction from the same address is then executed. If the instruction re-order is made so that the store instruction to the same address is executed after the load instruction from the same address has been executed, then the actually loaded data are not the necessary data which should be loaded.
FIG. 1 is a view illustrative of original instruction order and examples of available instruction re-ordering. An instruction (1) is a load instruction for loading data from an address “1000” represented in hexadecimal digits and the loaded data are then transferred to a register “r8”. An instruction (2) is a load instruction for loading data from address “1500” represented in hexadecimal digits and the loaded data are then transferred to a register “r9”. An instruction (3) is a store instruction for storing data into an address “1760” represented in hexadecimal digits, wherein the data have been stored in a register “r10”. An instruction (4) is a load instruction for loading data from an address “1840” represented in hexadecimal digits and the loaded data are then transferred to a register “r11”. An instruction (5) is a load instruction for loading data from the same address “1760” as the store instruction (3) and the loaded data are then transferred to a register “r12”.
There are no address dependency among the load instruction (1), the load instruction (2), and the load instruction (4) because those instructions have different addresses from each other. However, the store instruction (3) and the load instruction (5) have the same address, for which reason the address dependency exists, wherein the original instruction order should be ensured. Therefore, the instruction re-order should ensure that the store instruction (3) has been executed before the load instruction (5) is executed. Namely, any instruction re-orders may be available unless the store instruction (3) is executed after the load instruction (5) has been executed. One example of the available instruction re-order is the store instruction (3), the load instruction (1), the load instruction (2), and the load instruction (4) and the load instruction (5). Other example is that the load instruction (1), the load instruction (2), and the load instruction (4), the store instruction (3) and the load instruction (5). It may preferably take a longer time interval between the store instruction (3) and the load instruction (5) for shortening the total necessary time for executing all of the above five instructions.
A conventional structure for controlling the instruction re-order for ensuring the address dependency and a conventional operation thereof will subsequently be described with reference to the drawings. FIG. 2 is a block diagram illustrative of a conventional circuit configuration for detecting the presence of dependency. FIG. 3 is a diagram illustrative of an address data configuration for access to the main memory or the data memory. FIG. 4 is a block diagram illustrative of a fragmentary data cache structure including a tag memory and a data memory in one-way. FIG. 5 is a flow chart of sequential processes in accordance with instructions with the needs to retrieve tags for using data caches thereof, in connection with the structure of FIG. 2. The retrieval to the tags are needed to utilize the data caches of the load instruction, a prefetch instruction, and the store instruction. The retrieval to the tags is a retrieval for retrieving whether page frame numbers at addresses for the load instruction, the prefetch instruction, and the store instruction are stored in the tag memory of the data cache.
As shown in FIG. 3, the address signal comprises a page frame number (tag) of predetermined higher significant bits, an index of predetermined intermediate significant bits and an offset of predetermined lower significant bits. As shown in FIG. 4, the data cache comprises a tag memory 104 and a data memory 105. The tag memory 104 has plural memory areas with indexes “0”, “1”, “2”, “3”, - - - “M−1” for storing respective page frame numbers allocated to indexes thereof as well as storing plural bit data for storing other states not illustrated. The data memory 105 is divided into plural data areas with indexes “0”, “1”, “2”, “3”, - - - “M−1” which correspond to the memory areas of the tag memory 104. Each of the divided plural data areas is further divided into plural data sub-areas which may be designated by offset values.
With reference back to FIG. 2, the detection of the presence of the address dependency is executed by comparison of indexes of the addresses shown in FIG. 3. A store queue 101 for temporary storing the store instructions has four stages. It is assumed that the instruction with the tag-retrieval is intended to be executed, wherein this instruction has an index “B”. A comparator group 102 includes four comparators (0), (1), (2) and (3). The four comparators (0), (1), (2) and (3) respectively compare the four indexes “A0”, “A1”, “A2” and “A3” stored in the store queue 101 to the index “B” of the above instruction with the tag-retrieval. Respective results of the four comparators (0), (1), (2) and (3) are then subjected to logical OR-operation by an OR-gate 103, thereby corresponding one of the four indexes “A0”, “A1”, “A2” and “A3” to the index “B” can be retrieved.
As shown in FIG. 5, the sequential processes for the above instruction with the tag-retrieval will be described. In the step S101, comparisons are made between the retrieval-object index and all of the indexes of the store instructions stored in the store queue 101. If at least one of the indexes of the store instructions stored in the store queue 101 corresponds to the retrieval-object index, then the store instruction with the corresponding index to the retrieval-object index is executed in the step S102. The above comparisons are again made in the step S101. If none of the indexes of the store instructions stored in the store queue 101 correspond to the retrieval-object index, then the tag retrieval is executed to the object instruction in the step S103, wherein it is verified whether or not the page frame number of the object instruction has been stored in the retrieval-object index of the tag memory 104. If the page frame number of the object instruction has been stored in the retrieval-object index, then the process enters into the subsequent processes in the step S105. If the page frame number of the object instruction has not yet been stored in the retrieval-object index, then a replace process is executed to the indexes of the tag memory 104 in the step S104, followed by the subsequent processes in the step S105.
The above replace process is to update the contents of the tag memory 104 and the data memory 105 of the data cache upon updating the page frame number. The updating process may be classified into two types depending on the issue of whether or not the contents of the date memory 105 should be written back to the main memory. If, for example, data loaded from the main memory to the data memory 105 have not been updated at the updating time, then it is unnecessary to write these data back to the main memory. It is merely necessary that data corresponding to the newly set page frame number are loaded from the main memory to the corresponding index area of the data memory 105. This simple data load process without the data write-back is so called to as “refill operation”.
If the data are written back to the main memory before new data corresponding to the newly set page frame number are loaded from the main memory to corresponding index of the data memory 105, then those sequential processes are so called “write-back-and-refill operation”. The replace operation or the replace process is defined to include both the “refill operation” and the “write-back operation”. The expression “replace operation” means either the “refill operation” or the “write-back-and-refill operation”.
In the step S102, the object instruction is stalled until execution of the store instruction in the store queue has been completed. As described above, in accordance with the conventional technique, the comparison with reference to only the indexes are executed before the retrieval of the tag, for which reason if correspondence of at least one index can be confirmed, the store instruction become stalled. Even if the index correspondence can be confirmed between the store instruction with the tag-retrieval and the store instruction in the store queue, then it is possible that an off-set is different between the store instruction with the tag-retrieval and the store instruction in the store queue. If the off-set is different between those store instructions, this means that the addresses for those store instructions are different, and accordingly no address dependency is present between those store instructions. Notwithstanding, the conventional technique makes the store instruction stalled even if there is no address dependency. These unnecessary stalls of the instructions increase the probability of generating the stall state, thereby making it difficult to realize an efficient instruction re-order operation.
In the above circumstances, the development of a novel cache system control circuit free from the above problems is desirable.