1. Field of the Invention
This invention relates to a cache memory system provided in a microprocessor, and more particularly to a cache memory system having a plurality of ports.
2. Description of the Related Art
In recent years, the processing speed of electronic computers have dramatically increased. Particularly, the performance of microprocessors used as CPU's (Central Processing Units) of electronic computers have markedly improved because of the development of the semiconductor technology.
In recent CPU's, in order to enhance the efficiency of the instruction execution process, instruction pipeline systems are commonly employed. The instruction pipeline system is a system for dividing execution of the instruction into a plurality of stages such as an instruction fetch cycle, decode cycle, execution cycle, and data write-in cycle, and executing a plurality of instructions in a stepwise manner with the instructions partly overlapped. Since, in this system, an instruction is fetched before execution of a preceding instruction is completed, an instruction prefetch process is effected. The instruction prefetch process consists of fetching an instruction to be executed in the future while simultaneously decoding and executing the preceding instruction.
Thus, in order to cause the CPU to process a plurality of instructions in a pipeline parallel manner, high-speed readout of instructions and data readout/write-in operations with respect to the main memory are required.
Therefore, it is a common practice for the recent microprocessor to separately have an instruction cache and a data cache. The instruction cache is a cache memory exclusively used for instruction words and the data cache is a cache memory exclusively used for data fetched according to an operand of the instruction. It becomes possible to simultaneously access both instructions and data at high speeds by separately providing the instruction cache and data cache.
In the above microprocessor, access to the instruction cache is as follows.
Since the conventional instruction cache has only one port for instruction readout, the CPU sequentially updates an instruction fetch address and serially reads out a group of instructions successively stored in an address order one by one. Normally, since the order of the instructions executed by the CPU coincides with the order in which the instructions are stored in the instruction cache, the prefetch of an instruction from the instruction cache can be efficiently effected.
However, when a conditional branch instruction is executed by the CPU, the following problem occurs. In this case, since whether the branch condition is taken or untaken is not determined until execution of a condition code change instruction lying before the conditional branch instruction is completed, fetch of the branched instruction is delayed accordingly. This is because the instruction to be next executed depends on whether the branch condition is taken or untaken. Such delays inherent in the fetch of branched instructions significantly degrades the performance of the CPU.
To abrate the degradation in the performance of the CPU caused by the conditional branch instruction, the following two methods are known.
One of them is a method using branch prediction. In this method, rather than waiting for the determination of a condition code on which the conditional branch depends, the branch address is decoded in advance and the branched instruction is read out rather than the next instruction in address order.
However, this method has a problem that the degree of degradation (penalty) in CPU performance caused when the prediction is wrong is large and it is not highly practicable.
The second method employs a branched instruction buffer. This method is based on the fact that most of the branches are used in a loop process and involves previously storing a pair of a address consisting of the conditional branch instruction and the branched instruction in an exclusive branched instruction buffer provided separately from the instruction cache This saves the CPU from having to newly fetch the same branch instruction after every loop iteration.
However, this method requires large-scale and complicated hardware to ensure coincidence between the contents of the buffer and the instruction cache.
As described above, the conventional instruction cache is effective when instructions which are successive in an address order are read out, but has a defect that it takes a long time to read out a branched instruction when a conditional branch instruction is provided, thereby lowering the performance of the CPU.
Further, a microprocessor for executing two or more successive instructions in the same cycle has recently become more prevalent. This is called a super scalar type.
The super scalar type microprocessor can enhance performance by several times without enhancing the operation clock frequency. Therefore, this type of microprocessor can be widely used in a range of systems from a small-sized computer such as a personal computer or work station to a large-scale computer used as a host computer.
The super scalar type microprocessor has hardware for dividing a plurality of instructions fetched to the microprocessor into operation instructions, branch instructions, memory access instructions and the like and determining whether the instruction pair can be simultaneously executed. Further, in the recent super scalar type microprocessor, use of a data cache of 2-port structure is considered to simultaneously execute two memory access instructions.
In the normal program, the frequency of occurrence of memory access instructions (load/store instructions) is as high as 20 to 30%. Therefore, simultaneous execution of two memory access instructions may largely contribute to enhancement of the performance of the computer.
When a 2-port type data cache is used, execution of two memory access instructions is completed in one cycle if cache hit occurs on both of the two memory access instructions to be simultaneously executed. When a cache miss occurs in at least one of the memory access instructions, cache refill is effected.
In this case, the cache refill means the operation of transferring data from the main memory to the cache memory at the time of cache miss so as to exchange the contents of the cache memory.
That is, when a cache miss occurs in at least one of the accesses, the operation of the CPU of the microprocessor is interrupted until data is fetched from the main memory into the data cache by the cache refill process.
Thus, since the operation of the CPU is interrupted in the cache refill process, it is necessary to reduce time required for the cache refill to minimum possible time in order to prevent degradation of the performance of the CPU.
However, since only one memory bus is used to access the main memory, it becomes impossible to simultaneously effect the cache refill processes for the two memory access instructions. For this reason, if a cache miss occurs in both of the two memory access instructions to be simultaneously executed, the cache refill process for the memory access instructions must be effected as a bus transaction which is independent for each instruction.
Generally, in the main memory access via the memory bus, memory bus adjustment time for attaining the right of use of the bus is necessary. Therefore, the memory bus adjustment time is always inserted between two independent main memory accesses for the cache refill process. This causes an increase in time required for the cache refill, thereby lowering the performance of the CPU.
Further, when the CPU simultaneously executes two memory access instructions, different memory address values of the instructions sometimes compete on the same entry address of the data cache. In this case, the entry address indicates a lower address accessed by the CPU. The bit width of the lower address is determined according to the physical memory structure of the data cache.
Thus, even when two memory addresses compete on the same entry address of the data cache, the cache refill process for the two memory addresses is effected as an independent bus transaction as described before.
In this case, the cache refill for an instruction of high priority execution order is first effected, and then, the cache refill for an instruction of low priority execution order is effected. That is, data on a specified entry which has been exchanged by the cache refill effected in a preceding cycle is exchanged for different data by the cache refill effected in the next cycle. Therefore, the cache memory access for the first cache refill becomes a useless access operation and extra time is used for the useless access operation.
As described above, the 2-port structure data cache is suitable for enhancing the performance of the CPU by simultaneously inputting/outputting two data items, but has a defect that a long time is used for the cache refill.
The 2-port structure may be used in the above-described instruction cache. With this structure, two instructions can be simultaneously read out into the CPU so that the instruction readout efficiency can be enhanced.
However, even if the instruction cache is formed in a 2-port form, a next instruction cannot be read out until the branch condition is determined in a case where a different instruction is read out from the cache depending on whether the branch condition is taken or not as in the conventional case, and therefore, it is difficult to solve the problem that the performance of the CPU is lowered by the conditional branch instruction.