The present invention relates to a data processor having a cache memory, and more particularly to a software prefetch for efficiently using two types of cache memories and set associative control for most favorably controlling the access of the set associative cache memories. Moreover, the present invention relates to a data processor having a controller for these operations.
In general, a computer having a cache memory stores data to be frequently used in a small-capacity high-speed cache memory as a copy of part of the data stored in a large-capacity low-speed main memory, so that an instruction unit, such as a CPU, may make a high-speed data access to the cache memory for frequently used data and accesses to the main memory only when the desired data is not present in the cache memory.
However, because the machine cycle of the CPU is significantly shorter compared with that of the main memory, the penalty in the case of a cache miss (the time until requested data is obtained from the main memory) increases.
A method called software prefetch for solving the above problem is described in David Callhan et al., xe2x80x9cSoftware Prefetchingxe2x80x9d Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 40-52. In the method described in this first publication, an address is computed by a prefetch instruction before an instruction unit requires data, the address is checked to see if data indicated by the address is present in the cache memory, and if not, the data is transferred from the main memory to the cache memory. Therefore, it is possible to improve the hit ratio of the cache memory and minimize the penalty because data is previously stored in the cache memory by the prefetch instruction whenever data is required.
A cache memory comprising two buffers with different purposes, which are properly used by hardware is disclosed in Japanese Patent Laid-Open No. 303248/1992
In this second publication, the cache memory has an S buffer and a P buffer. The S buffer stores data to be accessed frequently over time. The P buffer stores data of which the addresses to be referenced from now on by the program are close to the currently referenced address, i.e. the P buffer stores the array data to be accessed in the array computation. Either one of the two buffers may be used selectively depending on the addressing mode in effect and on the type of register being used for the address calculation.
In general, a computer stores instructions or data to be frequently called and processed by a processor in a high-speed small-capacity memory, called a cache memory, as a copy of part of the instructions or data stored in a comparatively low-speed large-capacity main memory. Thus, the computer operation speed is increased. A data access system for such a cache memory includes a direct-mapped memory and a set associative memory.
The direct mapping system is used for accessing a cache memory by directly outputting data or an instruction stored in an address designated by a processor or the like and storing it in the designated address.
The set associative memory is used for accessing a plurality of sets of data values or a plurality of instructions (called a data set) in a cache memory having a plurality of sets, each of which comprises a plurality of memories common in allocation of addresses. A plurality of accessed sets of data values or a plurality of accessed instructions required are selected and processed in the processor.
FIG. 17 shows a schematic view of a data processor having a two-set associative cache memory according to a o third conventional arrangement. In FIG. 17, symbol 9201 represents a CPU, 9202 to 9217 represent 8-bit output universal memories, 9218 represents an address bus, 9219 represents a 64-bit data bus of a first set, and 9220 represents a 64-bit data bus of a second set. The universal memories are used as data arrays of the two-set associative cache memory. The memories 9202 to 9209 are used as the data array of the first set and the memories 9210 to 9217 are used as the data array of the second set.
When an address designated by the CPU is sent to memories through the address bus, two sets of data values each having a width of 64 bits are outputted to the CPU through a respective data bus.
To constitute a set associative cache memory having m sets of data values with the width of n bits by using k-bit output memories, xe2x80x9cnxc3x97m/kxe2x80x9d memory chips are necessary in general. In the case of the above-described third conventional arrangement, 16 memories are necessary because n equals 64, m equals 2, and k equals 8.
The method described in first publication has the problem that an expensive two-port cache memory must be used in order to process transfer of data from the main memory to the cache memory and a memory referencing instruction sent from the instruction unit at the same time. Unless simultaneous processing is carried out, it is possible to use a generally-used one-port cache memory. In this case, however, a lot of processing time is required and the feature of software prefetch cannot effectively be used.
Moreover, the method described in the first publication has the additional problem that, when data, which is read from a cache memory only once and is immediately expelled from the cache memory, is held in the cache memory, the cache memory is filled with useless data and the hit ratio decreases.
These problems frequently occur in a program for handling large-scale data exceeding the capacity of a cache memory.
The arrangement described in the second publication has the problem that, because a cache memory for storing data between two cache memories is determined by an address designation system and a register used for address computation, two cache memories must properly be used for considering data characteristics including data size.
It is the first object of the present invention to provide a data processor for solving the above problems, which is capable of quickly and efficiently processing small-capacity frequently accessed data stored in a cache memory and large-scale data exceeding the capacity of the cache memory, and which is also capable of lessening the contamination of the cache memory and improving the hit ratio.
The third conventional arrangement described with reference to FIG. 17 has a problem that, when the number of sets of set associative cache memories increases, or the data bit width increases and the number of memories for constituting the cache memories increases, the cache memory cost increases.
When the number of memories increases, problems occur in that the address bus fan-out, address bus length, and data bus length increase, the cache memory access time increases, and the machine cycle of the entire data processor cannot be shortened.
When the number of sets increases, problems occur in that a number of data buses equivalent to the number of sets is required and the number of pins of the CPU increases. That is, a problem occurs in that it is impossible to meet the restriction on the number of pins of a package in the case of one chip.
It is the second object of the present invention to provide a set associative cache memory comprising a smaller number of memories.
To achieve the above first object, the present invention involves the use of a first cache memory with a large capacity and one port and a second cache memory with a small capacity and two ports disposed between a main memory and an instruction processing section, and a control section controlled by a prefetch instruction to store data to be frequently accessed in the first cache memory and data to be less frequently accessed in the second cache memory.
Because data to be frequently accessed is stored in the first cache memory, the hit ratio is improved. Moreover, because data to be less frequently accessed is not stored in the first cache memory, the storing of useless data in the first cache memory can be lessened.
Because data to be less frequently used is stored in the second cache memory, the data can be removed from the second cache memory after it is processed. That is, because data to be always accessed is stored in the second cache memory, though the capacity of the second cache memory is small, the hit ratio can be improved.
Moreover, because the second cache memory has two ports, efficient processing is realized by simultaneously processing the transfer of large-scale data to be less frequently accessed from the main memory and the memory referencing instruction sent from the instruction unit.
Furthermore, because it is sufficient to provide only a small-capacity second cache memory with the function for simultaneously processing a data transfer from the main memory and the memory referencing instruction sent from the instruction unit, it is possible to decrease the hardware volume and the cost.
To achieve the above second object, the present invention provides a processor for processing instructions or data; a set associative cache memory comprising a plurality of memory chips each of which includes m (m is an integer equal to or larger than 2) sets of memory bank regions and an output section for sequentially accessing data sets one by one out of the above m sets of memory bank regions; a set judging section for generating a selection signal for selecting a memory bank region out of the above m sets of memory bank regions in accordance with an address sent from the processor; a set selecting section for outputting a data set selected by the selection signal out of the data sets to be sequentially accessed from the set associative cache memory to the processor; an address bus connected between the set associative cache memory and the processor to transfer an address for designating data from the processor; a first data bus connected between the set associative cache memory and the set selecting section to access the data sets; and a second data bus connected between the set selecting section and the processor to access the selected data set.
The above-described constitution makes it possible to decrease the number of memories to 1/m, as small as the existing number of memories, because m sets of memory bank regions are present in one memory chip.
Because the number of memories decreases, it is possible to decrease the loads on the address bus and the data bus, to access the cache memory at a high speed, and to shorten the machine cycle.
Moreover, because data sets are sequentially outputted from one memory chip one by one, only one data bus is required. Therefore, it is possible to decrease the number of pins and the load of the CPU.