This application relies for priority upon Korean Patent Application No. 2001-35425, filed on Jun. 21, 2001, the contents of which are herein incorporated by reference in their entirety.
The present invention generally relates to cache memory systems and, more specifically, to cache memory systems having a block replacement function.
Cache memories, which are common random access memories (RAMs) operable at high frequency, are used in computer systems to enhance data processing efficiency; they are accessible by central processing units (CPUs) more than general memories (e.g., dynamic RAMs). Data frequently accessed by CPUs are stored in the cache memories in order to save data (or signal) transmission times.
A CPU directly accesses a cache memory without using a data-conveying device such as an external memory management unit. As a result, the cache memory is usually positioned at a place physically adjacent to the CPU in order to shorten signal transmission times. An operating feature of a data RAM, including cache memories, in a microprocessor (e.g., xe2x80x98Alpha AXP 21064 Microprocessorxe2x80x99) has been disclosed at pp. 380xcx9c383 of xe2x80x9cComputer Architecture a Quantitive Approachxe2x80x9d published by Morgan Kaufman Publishers Inc., in 1996, and written by David A. Paterners and John L. Hennessy.
Referring to FIG. 1, which illustrates hierarchical memory architecture in a computer system, a processor (i.e., CPU) 20 is connected to a main memory 10 through a system bus 50, and to a secondary cache memory (or L2 cache) 40 through a processor bus 60. The architecture shown in FIG. 1 is advantageous to extend the number of bits transferred because the CPU can access the main memory 10 and the secondary cache memory 40 simultaneously, and to advance a latency condition because the secondary cache memory 40 is utilized as a backup memory for a primary cache memory (or L1 cache) 30. However, as data stored in the primary cache memory 30 do not always exist in the secondary cache memory 40, it must check whether or not data assigned to the primary cache memory 30 are present in the secondary cache memory 40 before removing the data from the primary cache memory 30. Therefore, a logical composition for data replacement becomes complicated.
It has been proposed to implement a set-associative cache memory in which memory locations are segmented into a plurality of groups or banks in order to increase a hit rate therein. The groups or the banks are also called sets or ways.
A recent trend of semiconductor manufacturing processes has been promoted to integrate a secondary cache memory on a microprocessor chip as well as a primary cache memory. In the case of embedding a secondary cache memory in a microprocessor chip together with a primary cache memory, it is possible to enhance overall performance of a chip because of a prominent increase in number of data bits accessible between the secondary cache memory and the microprocessor chip. Nevertheless, hit rate may be low because the embedded secondary cache memory sizes up to an areal limit contrary to an external one. While it is changing to increase the number of sets in the set-associative cache memory in order to compensate the degraded hit rate due to the reduced size of the embedded secondary cache memory on the microprocessor chip, the increase in the number of sets causes block replacement logic to be intricate and the circuit area to be enlarged.
There are several ways to replace blocks in a cache memory, such as LRU (Least-Recently-Used), FIFO (First-In-First-Out), and random replacement. LRU needs an additional RAM array to mark blocks of LRU every index of cache while it secures the smallest miss rate. In particular, the RAM array occupies more circuit area in proportion to an increase of the number of sets in the set-associative cache memory, causing an overall circuit size to be larger and a control function to be complex. Using the FIFO also needs an additional RAM array representing an order of block fetch every index, as well as a higher miss rate than any other ways. The random replacement needs a random number while it can be constructed with more simple hardware without an additional RAM array and has a miss rate that is the same with that of the LRU even when the number of sets of the set-associative cache memory increases with increasing cache size.
It is, therefore, an object of the present invention to provide a cache memory system employing an advanced random replacement approach.
It is another object of the present invention to provide a cache memory system capable of achieving faster operation with simpler logic composition, for block replacement, although sets increase in number.
The invention is directed to a cache memory system which includes a tag memory segmented into a plurality of tag sets storing tags and a data memory segmented into a plurality of data sets storing data bits, corresponding to the tag memory. A storage stores information about replacement with the data stored in the data sets. A selection block generates set selection signals in response to a counting operation to designate an alternative one of the data sets which has replaceable data with reference to the information in the storage circuit.
In one embodiment, each of the data and tag memories is segmented into a plurality of N sets.
The selection block can include a counter for conducting the counting operation from 1 to the N in an iterative sequence in response to a reset signal and a clock signal. A selection circuit can generate the set selection signals corresponding to a first-ordered set among the data sets which is the alternative one having the replaceable data, with reference to the information about the replacement. In one embodiment, the counter generates the counting signals of N bits in which at a specific time point one bit of the N bits is logical 1 and the other bits are logical 0, in response to a reset signal and a clock signal.
The counter can include N flipflops having input signals and first and second output terminals and responding to the reset signal, in which signals applied through the input terminals are output through the first output terminals and inverted at the second output terminals. The input terminal of the first flipflop is connected to the first output terminal of the (Nxe2x88x921)""th flipflop, the input terminal of the second flipflop is connected to the second output terminal of the first flipflop, and the input terminals of the third through (Nxe2x88x921)""th flipflops are connected to the first output terminals of their preceding flipflops.
The storage circuit contains information data of the N bits about replacement with the data stored in the data sets.
In one embodiment, the selection circuits comprises N detectors for generating detection signals of the N bits involved in the first-ordered set having the replaceable data, with reference to the N-bit information data about replacement, corresponding to the data sets. The selection circuit also includes N for generating set selection signals corresponding to the detection signals, in response to the counting signals.
Each of the detectors finds the first-ordered set from a data set, and then generates the N-bit detection signals corresponding to the first-ordered set.
The information data about replacement includes a logical 1 bit when a data set is replaceable with a new data and includes a logical 0 bit when a data set is exclusive to be replaceable with a new data.
Each of the detectors includes N detection cells corresponding to bits of the N-bit information data.
Each of the detection cells comprises a first inverter for inverting a corresponding bit of the N-bit information data; a first transistor connected between first and second nodes, a gate of the first transistor being coupled to the corresponding bit of the N-bit information data; a second transistor connected between a power supply voltage and a third node, a gate of the second transistor being coupled to the second node; a third transistor connected between the power supply voltage and a third node, a gate of the third transistor being coupled to the corresponding bit of the N-bit information data; a fourth transistor connected between the power supply voltage and a fourth node, a gate of the fourth transistor being coupled to the corresponding bit of the N-bit information data; a fifth transistor connected between the first and fourth nodes, a gate of the fifth transistor being coupled to an output signal of the first inverter; a sixth transistor connected between the second node and a fifth node, a gate of the sixth transistor being coupled to the output signal of the first inverter; a seventh transistor connected between the fifth node and the ground voltage, a gate of the seventh transistor being coupled to the corresponding one of the information data; and a second inverter for converting a signal at the third node into a bit of the set selection signals.
Each of the detectors is divided into a quantity K M-bit detection units, where M less than N. Each of the M-bit detection units finds a logical 1 bit among the M bits arranged in the N-bit information data and then generates a detection signal of logical 1 corresponding to the logical 1 bit among the M bits.
Each of the M-bit detection units generates a detection signal of a low level when a least one among bits before a corresponding set of M-bit belonging to the N-bit information data is logical 1.
Each of the selectors comprises N transistors for connecting the detection signals and a common node in response to the counting signals, the common node being connected to a corresponding output of the set selection signals.