1. Field of the Invention
The present invention relates to a microprocessor for a computer and, more particularly, to a cache controlling device for reducing latency, when there occurs a cache miss, by performing a replace control activated prior to the execution of a memory reference instruction, and a processor.
2. Description of the Related Art
In a microprocessor, an in-order control is operated through a pipeline configuration in which processing is carried out in the order of issuance of instructions, and it is ensured that the instructions are executed and completed through the in-order control even with respect to software, thus the software is also configured to be implemented based on the in-order control.
In recent years, by using superscalar and register renaming techniques there have been realized microprocessors in which, while an out-of-order control is being performed in which instructions are executed without following the order of issuance of the instructions as long as there exists no dependent relationship between the instructions, it looks like the instructions are completed sequentially when observing the execution of the instructions from the software side.
The same thing is occurring with respect to reading from and writing to a memory. However, there are things to be considered separately. For example, when thinking of reading from and writing in a memory, there is a case where a dependent relationship is intended depending upon programs which cannot be determined only by decoding an instruction.
Here, such a case will be described through the following two examples each including two instructions.
(1) load (d0), d1
(2) load (d1), d3
(3) load (d0), d1
(4) load (d2), d3
With Instruction (1), data stored at the address d0 in the memory is read and is then stored in d1, and with Instruction (2), data stored at the address d1 in the memory is read and is then stored in d3. Since the result of Instruction (1) is used in Instruction (2), there exists a dependent relationship between the instructions. Then, the hardware can determine that the instructions should be issued in the order of Instructions (1) and (2).
With Instruction (3), data stored at the address d0 in the memory is read and is then stored in d1, and with Instruction (4), data stored at the address d2 in the memory is read and is then stored in d3. However, since the hardware cannot determine the sequence between Instructions (3) and (4), whether or not the sequence guarantee is needed should be determined depending upon the intention of the program. Then, two cases, Cases A, B, could be thought of as follows.
Case A;
The results remain the same even if the data is read in a different order provided that it is ensured that the two pieces of data have not been changed when Instruction (4) is reached.
Case B;
If the data of Instruction (3) means a flag for indicating the validity of the data, the situation becomes different from Case A. In a case where another processor writes the data of Instruction (4) and thereafter writes in the data of Instruction (3) a value indicating that the data is valid, if the instructions are executed in a different order from the present row of instructions, the data is read before the data is written by Instruction (4), and the data is read which results after the data of Instruction (3) is written. As a result, a phenomenon arises in which the data indicated as valid becomes old.
Thus, while the order can be changed with Case A, with Case B the order cannot be changed, and the hardware cannot determine the order. This means that it is only a programmer who can determine the order.
Due to this, conventionally, with Case B being assumed, a system has been adopted in which the sequential control is effected.
However, in recent years, Case A is assumed and the sequence guarantee is alleviated. With respect to Case B, the programmer configures a program in which an exclusive instruction for guaranteeing the sequence is inserted between instructions for requesting the sequence guarantee, so that a system is adopted in which the hardware guarantees the sequence only when the exclusive instruction is given.
While the process of carrying out the replace control through the in-order control has been described heretofore, when observing the aforesaid Cases A and B, in the case of the in-order control, the hardware operates in a serialized fashion for both the cases. This is because the hardware cannot discriminate between the two cases.
On the other hand, in order for the aforesaid respective instructions to be operated through the out-of-order control, basically, the instructions are to be operated based on Case A. In a case where the instructions are operated through the out-of-order control based on Case B, the discrimination has to be made by inserting a member instruction. The in-order control can be performed only when this member instruction is given.
Namely, a control is needed to switch over the operation mode. Although being superior to the in-order control in performance, as seen with Case B, the out-of-order control is complicated in that, in controlling, the two controls have to be switched over. Furthermore, the insertion of the member instruction is to impose a limitation on the program.
On the contrary, although being inferior in performance, the in-order control is characterized in that it is relatively simple, in control, with no limitation being imposed on the program.
In addition, in an attempt to conceal a latency miss, a replace control is carried out which includes the activation of a next line replacement. In the event that there occurs a cache miss in a cache line of a cache, it is predicted that there also exists a cache miss in the following cache line, and the cache is retrieved. If a cache miss is found in the following cache line, since the replacing activation has been generated, the activation of the next line replacement is effected totally ignoring the characteristic of the program at the time of occurrence of the cache miss. Thus, this leads to a disadvantage that, in a case where accesses occur to addresses at random, a reverse effect may be provided.
Consequently, it is an object of the present invention to provide an in-order control bearing no program limitation which holds the advantages of both the in-order control and the out-of-order control and can perform a replace control for improving the latency when there occurs a cache miss in a cache.
With a view to attaining the above object, according to the present invention, there is provided a processor for a computer comprising a means for selecting and issuing an access instruction stored in an entry to a queue, a means for accessing a cache memory with the instruction so issued, a means for accessing the cache memory and issuing an instruction to access the next memory when a miss occurs to the cache, a data replace control means for registering data obtained by accessing the next memory in the cache memory, and a pre-access means for accessing the cache memory after the access instruction has been issued from the access instruction issuing means.
Furthermore, according to the present invention, the pre-access means is configured to pre-access the cache memory and issue an instruction to access the next memory when a miss occurs to the cache, to register data obtained by pre-accessing the next memory in the cache memory for carrying out a data replacement, and to end the pre-accessing when the pre-access to the next memory fails.
Moreover, according to the present invention, there is provided a controlling device for a load store unit in a computer comprising at least a first queue selection logical circuit, a second queue selection logical circuit and a mediating unit, wherein the first queue selection logical circuit sequentially selects from an instruction issuing unit access instructions to access the cache memory which are stored in queues, wherein the second queue selection logical circuit selects from the instruction issuing unit unissued access instructions of the access instructions to access the cache memory which are stored in the queues prior to the selections by the first queue selection logical circuit, and wherein the mediating unit mediates between the access instructions selected by the first queue selection logical circuit and the pre-access instructions selected by the second queue selection logical circuit for accessing the cache memory.