In general, a computer having a cache memory stores data to be frequently used in a small-capacity high-speed cache memory as a copy of part of the data stored in a large-capacity low-speed main memory, so that an instruction unit, such as a CPU, may make a high-speed data access to the cache memory for frequently used data and accesses to the main memory only when the desired data is not present in the cache memory.
However, because the machine cycle of the CPU is significantly shorter compared with that of the main memory, the penalty in the case of a cache miss (the time until requested data is obtained from the main memory) increases.
A method called software prefetch for solving the above problem is described in David Callhan et al., “Software Prefetching” Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 1991, pp. 40-52. In the method described in this first publication, an address is computed by a prefetch instruction before an instruction unit requires data, the address is checked to see if data indicated by the address is present in the cache memory, and if not, the data is transferred from the main memory to the cache memory. Therefore, it is possible to improve the hit ratio of the cache memory and minimize the penalty because data is previously stored in the cache memory by the prefetch instruction whenever data is required.
A cache memory comprising two buffers with different purposes, which are properly used by hardware is disclosed in Japanese Patent Laid-Open No. 303248/1992
In this second publication, the cache memory has an S buffer and a P buffer. The S buffer stores data to be accessed frequently over time. The P buffer stores data of which the addresses to be referenced from now on by the program are close to the currently referenced address, i.e. the P buffer stores the array data to be accessed in the array computation. Either one of the two buffers may be used selectively depending on the addressing mode in effect and on the type of register being used for the address calculation.
In general, a computer stores instructions or data to be frequently called and processed by a processor in a high-speed small-capacity memory, called a cache memory, as a copy of part of the instructions or data stored in a comparatively low-speed large-capacity main memory. Thus, the computer operation speed is increased. A data access system for such a cache memory includes a direct-mapped memory and a set associative memory.
The direct mapping system is used for accessing a cache memory by directly outputting data or an instruction stored in an address designated by a processor or the like and storing it in the designated address.
The set associative memory is used for accessing a plurality of sets of data values or a plurality of instructions (called a data set) in a cache memory having a plurality of sets, each of which comprises a plurality of memories common in allocation of addresses. A plurality of accessed sets of data values or a plurality of accessed instructions required are selected and processed in the processor.
FIG. 17 shows a schematic view of a data processor having a two-set associative cache memory according to a third conventional arrangement. In FIG. 17, symbol 9201 represents a CPU, 9202 to 9217 represent 8-bit output universal memories, 9218 represents an address bus, 9219 represents a 64-bit data bus of a first set, and 9220 represents a 64-bit data bus of a second set. The universal memories are used as data arrays of the two-set associative cache memory. The memories 9202 to 9209 are used as the data array of the first set and the memories 9210 to 9217 are used as the data array of the second set.
When an address designated by the CPU is sent to memories through the address bus, two sets of data values each having a width of 64 bits are outputted to the CPU through a respective data bus.
To constitute a set associative cache memory having m sets of data values with the width of n bits by using k-bit output memories, “n×m/k” memory chips are necessary in general. In the case of the above-described third conventional arrangement, 16 memories are necessary because n equals 64, m equals 2, and k equals 8.
The method described in first publication has the problem that an expensive two-port cache memory must be used in order to process transfer of data from the main memory to the cache memory and a memory referencing instruction sent from the instruction unit at the same time. Unless simultaneous processing is carried out, it is possible to use a generally-used one-port cache memory. In this case, however, a lot of processing time is required and the feature of software prefetch cannot effectively be used.
Moreover, the method described in the first publication has the additional problem that, when data, which is read from a cache memory only once and is immediately expelled from the cache memory, is held in the cache memory, the cache memory is filled with useless data and the hit ratio decreases.
These problems frequently occur in a program for handling large-scale data exceeding the capacity of a cache memory.
The arrangement described in the second publication has the problem that, because a cache memory for storing data between two cache memories is determined by an address designation system and a register used for address computation, two cache memories must properly be used for considering data characteristics including data size.
It is the first object of the present invention to provide a data processor for solving the above problems, which is capable of quickly and efficiently processing small-capacity frequently accessed data stored in a cache memory and large-scale data exceeding the capacity of the cache memory, and which is also capable of lessening the contamination of the cache memory and improving the hit ratio.
The third conventional arrangement described with reference to FIG. 17 has a problem that, when the number of sets of set associative cache memories increases, or the data bit width increases and the number of memories for constituting the cache memories increases, the cache memory cost increases.
When the number of memories increases, problems occur in that the address bus fan-out, address bus length, and data bus length increase, the cache memory access time increases, and the machine cycle of the entire data processor cannot be shortened.
When the number of sets increases, problems occur in that a number of data buses equivalent to the number of sets is required and the number of pins of the CPU increases. That is, a problem occurs in that it is impossible to meet the restriction on the number of pins of a package in the case of one chip.
It is the second object of the present invention to provide a set associative cache memory comprising a smaller number of memories.