The present invention relates to a microprocessor having a reduced instruction set computer (hereinbelow referred to as RISC) architecture, and more particularly to a microprocessor capable of decreasing penalty occurrences when instructions which operate a built-in cache memory are consecutive.
Cache memories built into microprocessors, having a RISC architecture designed to execute one instruction per cycle, are generally used to improve the performance of multiprocessor designs and pipeline constructions providing parallel instruction processing and to achieve microprocessors having a higher effective performance.
The construction of a conventional RISC type microprocessor including a cache memory is shown in FIG. 10. (See MC68020 User's Manual, Chapter 7, On-chip cache memory, pp. 89-91; published by CQ Shuppan.) As shown in FIG. 10, the microprocessor 101 comprises a CPU 41, an address converter 42 to convert the logical address 72 generated by the CPU 41 to a physical address 711 and a single port cache memory device 6. The cache memory device 6 includes a decoder 62 to decode the address using the second part of the physical address 711, memory arrays 611, 612 accessed according to the output of the decoder 62, a comparator 63 to compare the first part of the physical address 711 and the physical address 713 read from the memory array 611 to determine if they are the same, and a tri-state buffer 64 to output data read from the memory array 612 only when a cache hit signal 741 is output from the comparator 63, that is, the comparator 63 Judges that the first part of the physical address 711 coincides with the physical address 713 read form the memory array 611.
The basic operation of the microprocessor 101 having this built-in single port cache memory device 6 is described hereinbelow with reference to FIG. 11. The logical address 72 is generated by the CPU 41 when a memory operation instruction is executed. This logical address 72 is converted by the address converter 42 to the physical address 711 used in address decoding. The second part of the converted physical address 711 is used to select the set address of the memory arrays 611, 612 via the decoder 62, and the physical address 713 stored in the single port memory array 611 and data stored in the single port memory array 612 is read simultaneously. The first part of the physical address 711 is then compared by the comparator 63 with the physical address 713 read from the memory array 611; if the addresses match, a cache hit signal 741 is output, and if they do not match, a cache miss signal 741 is output. When the cache hit signal 741 is output, the tri-state buffer 64 is opened by the cache hit signal 741, and a data read from an external device memory array 612 is output to the form single port cache memory device 6. When the cache miss signal 741 is output, the tri-state buffer 64 enters into a high-impedance state and prevents the data from the memory array 612 from being output from the single port cache memory device 6.
Furthermore, it is also common to provide a cache memory adapted to a multiprocessor architecture having a bus snoop function to prevent the reading of false data from the cache memory when data which differs from the data in the common external memory is present in the cache memory of the microprocessor. A dual port memory array is commonly used in the on-chip cache memory for this purpose with the ports of the memory array divided between the CPU and the common bus so that the memory array can be accessed by either port.
In a conventional microprocessor including a cache memory device as thus described, however, when there are successive instructions operating the cache memory it becomes difficult to execute one instruction per cycle and a penalty occurs, thus decreasing the performance of the overall system. This is a major drawback to a RISC type microprocessor which is limited to executing one instruction per cycle.
For example, when plural store instructions are successively generated in this conventional microprocessor, it is necessary for the CPU 41 to continue outputting the logical address 72 between the cycle in which the physical address 711 is converted and the cycle in which the memory arrays 611 and 612 are accessed. In other words, the logical address 72 generated by the CPU 41 cannot be output for the next store instruction until the memory arrays 611, 612 access is completed as shown in FIG. 11, and a penalty occurs. Therefore, the longer the succession of store instructions, the longer the duration of the penalty becomes, thereby decreasing the operating speed of the overall system.