The present invention is generally directed to cache memories and, in particular, to a virtual victim cache for use in a processor core.
It is essential that a microprocessor executes instructions in the minimum amount of time. Many technologiesxe2x80x94quite often relying on radically different approachesxe2x80x94have been developed to increase microprocessor speeds. One approach is to increase the speed of the clock that drives the processor. As the clock rate increases, however, the power consumption and temperature of the processor also increase. Also, processor clock speeds may not increase beyond a threshold physical speed. As a result, there is a practical maximum to the clock speed of conventional processors.
An alternate approach to improving processor speeds is to reduce the number of clock cycles required to perform a given instruction. Under this approach, instructions execute faster and overall processor throughput increases even if the clock speed remains the same. One technique for increasing processor throughput is pipelining, which calls for the processor to be divided into separate processing stages. Instructions are processed in an assembly line fashion in the processing stages. Each processing stage is optimized to perform a particular processing function, thereby causing the processor as a whole to become faster.
A cache is a small, fast memory that holds a small group of instructions and data for use by a processor. The processor retrieves data and instructions from the cache memory, rather than the slower main memory, as long as the required data and instructions are in the cache memory. If a needed instruction or data value is not in the cache memory, a cache miss occurs and the instruction or data value is fetched from main memory. Processor throughput can be maximized by minimizing the cache misses (or maximizing cache hits) and minimizing cache access time.
The design of a cache memory often is a compromise between access time, hit rate, and power consumption. Direct-mapped cache memories, in which instructions or data are stored in a single storage block, have the fastest access times and relatively low power consumption. Power consumption is particularly important in system-on-a-chip devices and in battery powered systems, particularly mobile communication devices. Unfortunately, direct-=mapped caches also have the lowest hit rates (i.e., highest miss rates). N-way set associative caches, in which instructions or data are stored in one of N storage blocks (or ways), have higher hit rates, but also suffer from slower access times and higher power consumption. It is desirable to reduce the amount of compromising that is necessary in cache memory design.
Therefore, there is a need in the art for cache memories that maximize processor throughput. In particular, there is a need in the art for cache memories that have a reduced access time and low power consumption, with comparatively high hit rates. More particularly, there is a need an improved cache memory that has the speed and low power consumption of a direct-mapped cache and the high hit rate of an N-way set associative cache.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide an N-way set associative virtual victim cache in which cache accesses are automatically directed only to the data array in the most recently used way. According to an advantageous embodiment of the present invention, the cache memory comprises: 1) N ways, each of the N ways comprising a data array capable of storing L cache lines and a tag array capable of storing L address tags, each of the L address tags associated with one of the L cache lines; and 2) address decoding circuitry capable of receiving an incoming memory address and accessing a target cache line corresponding to the incoming memory address only in a most recently used one of the N ways.
According to one embodiment of the present invention, the cache memory further comprises hit determination circuitry capable of receiving the incoming memory address and accessing an address tag corresponding to the incoming memory address in each tag array in each of the N ways and determining if a cache hit occurred in one of the N ways.
According to another embodiment of the present invention, the cache memory further comprises a register capable of storing a most recently used (MRU) value identifying the most recently used way.
According to still another embodiment of the present invention, the address decoding circuitry uses the MRU value to access the target cache line in the most recently used way.
According to yet another embodiment of the present invention, the register is disposed in the address decoding circuitry.
According to a further embodiment of the present invention, the cache memory further comprises update circuitry capable of modifying the MRU value in the register in response to a determination by the hit determination circuitry that the cache hit occurred in a target one of the N ways other than the most recently used way.
According to a still further embodiment of the present invention, the update circuitry modifies the MRU value such that the MRU value identifies the target way as a new most recently used way.
According to a yet further embodiment of the present invention, the update circuitry is further capable of modifying the MRU value in response to a determination by the hit determination circuitry that a cache miss occurred.
In one embodiment of the present invention, the update circuitry modifies the MRU value after the cache miss occurred to identify a least recently used one of the N ways as a new most recently used way.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.