As, in usual computer programs, load/store instructions are frequently issued, it becomes more important for data processors to deal with such instructions at high speed for their best performance. A cache memory, into which data can be written and from which data can be read at a high speed by a central processing unit (CPU), offers solutions to such an issue.
When a CPU issues to a cache memory a write request on the basis of a store instruction, it does not have to wait for that write request to be completed. If a CPU issues a read request on the basis of a load instruction, it immediately needs data in question from a cache memory so as to respond to that read request. To improve data processor performance, various techniques are proposed. In one of these techniques, a load instruction is given priority and thus is preferentially dealt with by keeping a store instruction in the wait state. One such technique is disclosed by Mike Johnson in "Superscalar Microprocessor Design", published by Prentice Hall, N.J., 1991, pp. 150-152.
Taking, for example, such a case that a store instruction gives rise to a cache miss in an instruction sequence in which a load instruction follows immediately after the store instruction. In such a status, a cache memory fetches from a main memory data of the address of the store instruction and replaces its own data entry with the data thus fetched. A main memory is very slow compared to a cache memory, so that if a subsequent load instruction is held up until completion of a store instruction this causes serious performance loss.
In pipeline data processors, when a store instruction stores an instruction result obtained by arithmetic operations over a plurality of cycles, the address of data being stored becomes available at once. Data to be stored, however, may not become available at once due to incompletion of arithmetical operations for a previous instruction. When a load instruction follows after a store instruction, keeping the former in the wait state until completion of the latter results in the loss in performance.
Therefore, in the above-described case where it is not possible to immediately execute a store instruction, a load instruction is preferentially executed by holding up the execution of a store instruction. It is however to be noted that if the address of data to be stored and the address of data to be loaded are the same this does not give priority to the execution of a load instruction. This status is detected by an address comparator. In the case that a load instruction is to be executed preferentially while plural store instructions are kept in the wait state, plural latches capable of holding plural data items to be stored and their addresses are provided, and a comparison of the address of data to be loaded and the address of each of plural data items to be stored is made. The number of comparators and the number of latches must be identical with each other.
FIG. 10 is a conventional two-way set-associative cache memory. This conventional cache memory is now described. Banks 40a and 40b are provided. The first bank 40a has a plurality of DATA ENTRIES 2a for the storage of data, a plurality of TAG ENTRIES 3a for the storage of address information of data, and a plurality of VALID bits 4a indicative of whether specific data is valid. Likewise, the second bank 40b has a plurality of DATA ENTRIES 2b, a plurality of TAG ENTRIES 3b and a plurality of VALID bits 4b all of which are identical in function with their counterparts in the first bank 40a. A write buffer 41 holds a WRITE ADDRESS 44 and WRITE DATA 45 in relation to a store instruction. The write buffer 44, used here, can hold two WRITE ADDRESSES 44 and two items of WRITE DATA 45. The write buffer 41 further includes a coincidence comparator 42 for making a comparison between the WRITE ADDRESS 44 held and an ADDRESS 10 given from a CPU. A selector 7 selects the ADDRESS 10 from the CPU or a WRITE ADDRESS 46 output from the write buffer 41, thereby outputting the ADDRESS 10 or the WRITE ADDRESS 46, whichever is selected, as a SELECTED ADDRESS 12. Hit detectors 8a and 8b are provided. The hit detector 8a checks the first bank 1a for the occurrence of a cache hit, whereas the hit detector 8b checks the second bank 1b for the occurrence of a cache hit. CACHE HIT SIGNALS (CHITs) are represented by reference numerals 13a and 13b respectively. A replacement controller 43, at the time when a cache miss occurs, sends out a SELECTION signal 14 indicating which data entry to replace.
When a store instruction is executed, the selector 7 selects the ADDRESS 10 given from the CPU and outputs it. The TAG ENTRIES 3a and 3b are read out together with the VALID bits 4a and 4b. The hit detectors 8a and 8a determine whether a cache hit or cache miss occurs. If a cache hit is detected, this keeps the ADDRESS 10 stored in the write buffer 41 as the WRITE ADDRESS 44 until write data becomes available. Then, when a load instruction is executed, the hit detectors 8a and 8b likewise determine whether a cache hit or cache miss occurs, and at the same time the coincidence comparator 42 compares the ADDRESS 10 with the ADDRESS 44 held at the write buffer 41. If a successful comparison of the ADDRESS 10 and the ADDRESS 44 occurs, this means that data to be read out by the load instruction is the very data to be stored by the store instruction. A read operation by the load instruction is held up until completion of a write operation by the store instruction.
In the above-described set-associative type cache memory, plural entries exist to hold data in relation to a single address. However, it is impossible to know which data entry to write by a store instruction kept in the wait state. Due to this, even though a mismatch of two addresses is detected, a load instruction that has caused a cache miss is likely to replace an entry to be stored. Because of this, a cache miss will occur in relation to a store instruction. This results in the malfunction of data processor, thus leading to the drop in data processor performance. Even with a directly mapped cache memory, when preferentially dealing with a load instruction by holding up the execution of a store instruction, a load instruction that has caused a cache miss may accidentally replace an entry to be stored.
Additionally, even when attempting to preferentially deal with a load instruction by holding up the execution of a store instruction, if a write address held at a write buffer and a read address are the same, this keeps a read operation by the load instruction in the wait state until a write operation by the store instruction is over. Therefore, the speed of instruction processing cannot be improved.
Further, if a cache hit is detected at a first cycle and write operation is executed at the next cycle when executing store instructions in succession, each store instruction requires two cycles. Clearly, this presents a serious obstacle to the improvement of data processor performance.