This application relies for priority upon Korean Patent Application No. 2000-30879, filed on Jun. 5, 2000, the contents of which are herein incorporated by reference in their entirety.
1. Field of the Invention
The present invention relates to digital data processing systems such as computer systems. More particularly, the invention relates to cache memories in digital data processing systems and methods of operating the cache memories.
2. Description of the Related Art
A computer system generally comprises a central processing unit (CPU), a system bus, a memory subsystem, and other peripherals. The CPU executes instructions stored in the memory subsystem, and the bus serves as a communication pathway between the CPU and other devices in the computer system. The memory subsystem typically includes a slow and inexpensive primary, or xe2x80x9cmainxe2x80x9d, memory, such as Dynamic Random Access Memory (DRAM), and fast and expensive cache memories, such as Static Random Access Memories (SRAMs).
Cache subsystems of a computer system are the result of a discrepancy in speed capability and price between SRAMs and DRAMs. This discrepancy lead to a architectural split of main memory into a hierarchy in which a small, relatively-fast SRAM cache is inserted in the computer system between a CPU and a relatively-slow, larger capacity, but less expensive, DRAM main memory.
A cache memory holds instructions and data which have a high probability of being desired for imminent processing by the CPU. By retaining the most-frequently accessed instructions and data in the high speed cache memory, average memory-access time will approach the access time of the cache. Therefore, use of caches can significantly improve the performance of computer systems.
Active program instructions and data may be kept in a cache by utilizing a phenomenon known as xe2x80x9clocality of referencexe2x80x9d. The locality of reference phenomenon recognizes that most computer program instruction processing proceeds in a sequential fashion with multiple loops, and with a CPU repeatedly referencing to a set of instructions in a particular localized area of a memory. Thus, loops and subroutines tend to localize the references to memory for fetching instructions. Similarly, memory references to data also tend to be localized, because table lookup routines or other iterative routines repeatedly refer to a relatively small portion of a memory.
In a computer system, a CPU examines a cache prior to a main memory when a memory access instruction is processed. If a desired word (data or program instruction) is found in the cache, the CPU reads the desired word from the cache. If the word is not found in the cache, main memory is accessed to read that word, and a block of words containing that word is transferred from the main memory to the cache by an appropriate replacement algorithm. If a cache has the word is wanted by CPU, it is called a xe2x80x9chitxe2x80x9d; if not, it is called a xe2x80x9cmiss.xe2x80x9d
A line of a simple cache memory usually consists of an address and one or more data words corresponding to that address. A line is also a minimum unit of information that can be moved between a main memory and a cache memory.
Data from a location in a main memory is stored on one line in a cache. Locations of a cache need to be identified. This is done by taking a portion of a main memory address. Also, because there are fewer cache lines than main memory blocks, an algorithm is needed for determining which main memory blocks are read into cache lines.
Various techniques are known for mapping blocks of a main memory into a cache memory. Typical forms of mapping include direct mapping, fully associative mapping, and set associative mapping.
Direct mapping technique maps each block of a main memory into only one possible cache line. This technique is simple and inexpensive to implement, but its primary disadvantage is that there is a fixed location for any given block. Thus, if a program happens to reference repeatedly from two different blocks that map into the same line, then the blocks will be continuously swapped in the cache, and their hit ratio will be low.
Fully associative mapping overcomes the drawbacks of direct mapping by permitting each main memory block to be loaded into any line of a cache. With this technique, there is flexibility as to which block to replace when a new block is read into a cache. A principal disadvantage of this technique is the complex circuitry to examine tags of all cache lines in parallel.
Set associative mapping (usually referred to as xe2x80x9cN-way set associative mappingxe2x80x9d) is a compromise that exhibits the strengths of both direct and fully associative approaches. In this technique, a cache is divided into plural sets, each of which consists of several lines. This technique maps a block of main memory into any of the lines of set and permits the storage of two or more data words in a cache memory at the same set address (i.e., in one line of cache). In this approach, cache control logic interprets a main memory address simply as is three fields: a set, a tag, and a word. With set associative mapping, tag in a main memory address is relatively small and is only compared with tags within a single set, unlike the fully associative mapping wherein tag in a main memory address is quite large and must be compared to the tag of every line in a cache.
Performance of cache memories is frequently measured in terms of a xe2x80x9chit ratio.xe2x80x9d When a CPU references a cache memory and finds a desired instruction or data word in the cache, the CPU produces a hit. If the word is not found in the cache, then the word is in a main memory and the cache access counts as a miss. The ratio of the number of hits divided by the total CPU references to memory (i.e. hits plus misses) is the hit ratio.
To maximize hit ratio, many computer system organizations and architectures allow system control over the use of caches. For example, a cache may be used to store instructions only, data only, or both instructions and data. The design and operation principles of cache memories are described in detail in several handbooks, for example, entitled xe2x80x9cAdvanced Microprocessors,xe2x80x9d by Daniel Tabak, McGraw-Hill Book Co., Second Edition (1995), Chap. 4, pp. 43-65; xe2x80x9cComputer Organization And Architecture,xe2x80x9d by William Stalling, Prentice-Hall, Inc., Fifth Edition (1996), Chap. 4, pp. 117-151; and xe2x80x9cHigh Performance Memories,xe2x80x9d by Betty Prince, John Wiley and Sons, Inc., (1996), Chap. 4, pp. 65-94, which are hereby incorporated herein by reference.
To identify whether a cache hit or a cache miss occurs, that is, to know if a desired word is found in a cache, it is always necessary to access tag stored in the cache. Due to the current trends toward increasing cache size for high performance requirements (it is known that hit ratio of a simple cache tends to go up as the size of cache goes up.), the number of repetitive tag accesses in memory reference cycles increases. This results in more power consumption in caches and so hampers applying such caches to low power applications.
An object of the present invention is accordingly to provide methods and apparatuses for reducing power consumption of and improving performance of cache integrated circuit memory devices.
To attain the object, the present invention recognizes that a cache hit always occurs when current access is applied to instructions and/or data on the same cache line that was accessed and hit in the most recent access, and that if a miss occurred during the preceding access, a hit/miss of current access to the same line depends on whether or not a xe2x80x9ccache line fillxe2x80x9d (in which a complete cache line is read from main memory into cache memories) for the same line has been performed.
According to an aspect of the present invention, a digital data processing system is provided which includes a digital data processor, a cache memory having a tag RAM and a data RAM, and a controller for controlling accesses to the cache memory. The controller stores state information on access type, operation mode and cache hit/miss associated with a first access to the tag RAM, and controls a second access to the tag RAM, just after the first access, based on the state information and a portion of a set field of a main memory address for the second access. In particular, the controller determines whether the second access is applied to the same cache line that was accessed in the first access, based on the state information and a portion of a set field of the main memory address for the second access, and allows the second access to be skipped when the second access is applied to the same cache line that was accessed in the first access.
The cache memory may comprise a level-one (L1) cache or a level-two (L2) cache. In certain embodiments, the controller may be integrated on the same chip as the processor, along with a L1 cache. In other implementations, the controller may be integrated on a stand-alone chip, a memory controller chip, or each cache memory chip.
According to a preferred aspect of the present invention, the controller determines whether the first and second accesses are performed in a sequential fashion using the portion of the set field of the main memory address for the second access. The portion of the set field includes a least significant bit of the set field.
According to another aspect of the present invention, a cache integrated circuit memory device coupled between a processor and a main memory in a digital data processing system is provided, which comprises a data RAM circuit, a tag RAM circuit, a skip flag generator, a first RAM access control logic, a hit discriminator, and a second RAM access control logic.
The data RAM circuit is responsive to a portion of a main memory address from the processor and temporally stores instructions and data processed by the processor. The tag RAM circuit stores tags for accesses to the data RAM circuit and generates a plurality of tag hit signals by comparing a tag field of the main memory address with the stored tags. The skip flag generator generates a skip flag signal in response to an access type signal and an address signal from the processor. The first RAM access control logic controls accesses to the tag RAM circuit in response to the skip flag signal. The hit discriminator generates a plurality of data hit signals in response to an operation mode signal from the processor, the skip flag signal and the tag hit signals. The second RAM access control logic controls accesses to the data RAM circuit in response to the data hit signals. The skip flag generator includes circuitry to determine whether a current access to the tag RAM circuit is applied to the same cache line that was accessed in a preceding access to the tag RAM circuit by checking the access type signal and the address signal from the processor, and activates the skip flag signal in the current access is applied to the same cache line that was accessed in the preceding access. In particular, the first RAM access control logic cuts off supply of a clock signal to the tag RAM circuit when the skip flag signal is activated so as to allow the current access to the tag RAM circuit to be skipped. On the other hand, The second RAM access control logic, while the skip flag signal is active, transfers a clock signal to the data RAM circuit so as to allow the data RAM circuit to be accessed.
According to another aspect of the present invention, a method for operating the cache memory is provided which comprises determining whether a current access to a tag RAM circuit is applied to the same cache line that was accessed in a preceding access, and allowing the current access to the tag RAM to be skipped when the current access is applied to the same cache line that was accessed in the preceding access.
In an embodiment, the determining step includes generating a sequential access signal and a first main memory address for a current access to the cache memory, the sequential access signal indicative of a sequential access from a preceding access to the current access, detecting activation of the sequential signal and determining whether a one-bit address signal in the current access is identical with that in the preceding access, and activating a skip flag signal when the sequential signal is activated and the one-bit address signal in the current access is identical with that in the preceding access. The allowing step includes cutting off supply of a clock signal to the tag RAM circuit when the skip flag signal is activated.