The present invention relates generally to the field of electronic data processing devices. More particularly, the present invention relates to microprocessor on-chip cache memories.
Many computer systems today use cache memories to improve the speed of access to more frequently used data and instructions. A small cache memory may be integrated on a microprocessor chip itself, thus, greatly improving the speed of access by eliminating the need to go outside the microprocessor chip to access data or instructions from an external memory.
During a normal data accessing routine, the microprocessor will first look to an on-chip cache memory to see if the desired data or instructions are resident there. If they are not, the microprocessor will then look to one or more off-chip memories. On-chip memory, or cache memory, is smaller than main memory. Multiple main memory locations may be mapped into the cache memory. The main memory locations, or addresses, which represent the most frequently used data and instructions get mapped into the cache memory. Cache memory entries must contain not only data, but also enough information (xe2x80x9ctag address and statusxe2x80x9d bits) about the address associated with the data in order to effectively communicate which external, or main memory, addresses have been mapped into the cache memory. To improve the percentage of finding the memory address in the cache (the cache xe2x80x9chit ratioxe2x80x9d) it is desirable for cache memories to be set associative, e.g., a particular location in memory may be stored in multiple ways in the cache memory.
Most previous cache designs, because of their low frequency, can afford a relatively large cache, e.g. a cache which contains both integer data and larger floating point data. In lower frequency microprocessors, a relatively large cache could still have an access latency of a single clock cycle. However, as microprocessor frequencies and instruction issue width increases the cache access latency can become greater than two clock cycles.
One approach to improving the performance of an on-chip cache includes dual porting and pipelining the cache. Previous cache designs which are dual-ported and pipelined have complex, and costly, self-timed circuits to correctly align memory and tag array access. The addition of self-timed circuits, expends valuable processor space which could otherwise be used for a larger cache capacity. Moreover, complex control schemes are used in these designs since distinct clock cycles are not allocated to the separate cache functions of xe2x80x9ccache lookupxe2x80x9d and xe2x80x9cdata manipulation.xe2x80x9d
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, it is desirable to develop improved performance for cache memory.
The present invention includes a novel cache design that allows two cache requests to be processed simultaneously (dual-ported) and concurrent cache requests to be in-flight (pipelined). The cache design includes a first cache memory stage adapted for cache data access. At least two address ports are coupled to the first cache memory stage. Each address port is adapted to provide an input for a cache address on a first clock cycle of a processor clock signal. The cache design includes a second cache memory stage adapted for cache data manipulation. The second cache memory stage is adapted to receive cache data corresponding to cache data address found in the first cache memory stage in a second clock cycle of the processor clock signal. Thus, the design of the cache allocates the first clock cycle to cache tag and data access and the second clock cycle is allocated to data manipulation.
In an alternative embodiment, a method for accessing a cache memory is provided. The method includes receiving a first cache address into a first cache memory stage at a first address port in a first clock cycle. A second cache address is received into the first cache memory stage at a second address port in the first clock cycle. A first data set corresponding to the first cache address is provided to a second cache memory stage in a second clock cycle. The method further includes providing a second data set corresponding to the second cache address to the second cache memory stage in the second clock cycle.