The present invention relates generally to the field of very large scale integrated circuits fabricated on a single semiconductor die or chip. More particularly, the invention relates to the field of high-performance cache memories.
Cache memories have been used to maximize processor performance, while maintaining reasonable system costs, for many years. A cache memory is a very fast buffer comprising an array of local storage cells that is used by a processor to hold frequently requested copies of data. A typical cache memory system comprises a hierarchy of memory structures, which usually includes a local (L1), on-chip cache that represents the first level in the hierarchy. A secondary (L2) cache is often associated with the processor for providing an intermediate level of cache memory between the processor and main memory. Main memory, also commonly referred to as system or bulk memory, lies at the bottom (i.e., slowest, largest) level of the memory hierarchy.
In a conventional computer system, a processor is coupled to a system bus that provides access to main memory. An additional backside bus may be utilized to couple the processor to a L2 cache memory. Other system architectures may couple the L2 cache memory to the system bus via its own dedicated bus. Most often, L2 cache memory comprises a static random access memory (SRAM) that includes a data array, a cache directory, and cache management logic. The cache directory usually includes a tag array, tag status bits, and least recently used (LRU) bits. (Each directory entry is called a xe2x80x9ctagxe2x80x9d.) The tag RAM contains the main memory addresses of code and data stored in the data cache RAM plus additional status bits used by the cache management logic. By way of background, U.S. Pat. No. 6,115,795 discloses a computer system comprising a processor that includes second level cache controller logic for use in conjunction with an external second level cache memory.
Recent advances in semiconductor processing technology have made possible the fabrication of large L2 cache memories on the same die as the processor core. As device and circuit features continue to shrink as the technology improves, researchers have begun proposing designs that integrate a very large (e.g., multiple megabytes) third level (L3) cache memory on the same die as the processor core for improved data processing performance. While such a high level of integration is desirable from the standpoint of achieving high-speed performance, there are still difficulties that must be overcome.
Large on-die cache memories are typically subdivided into multiple cache memory banks, which are then coupled to a wide (e.g., 32 bytes, 256 bits wide) data bus. For example, U.S. Pat. Nos. 5,752,260 and 5,818,785 teach interleaved cache memory devices having a plurality of banks consisting of memory cell subarrays. In a very large cache memory comprising multiple banks, one problem that arises is the large RC signal delay associated with the long bus lines when driven at a high clock rate. Thus, there is a need for some sort of repeater device to connect each bank of cache memory to the data bus without loss of signal integrity.
One traditional method for sharing a bus is to have each circuit utilize a tri-state driver in order to connect to the bus. Tri-state driver devices are well known in the prior art. A conventional tri-state driver comprises two transistor devices coupled in series to pull the output to either a high or low logic level. The third output state is a high impedance (i.e., inactive) state.
When a tri-state driver is utilized to connect to a bus, the two series-connected output devices of the driver need to be large so as to provide adequate drive strength to the long bus wire. This requirement, however, makes it difficult to use tri-state drivers as repeaters in a multi-megabyte on-die cache memory because the large source/drain diode of the output devices adds considerable load to the bus. The additional load attributable to the tri-state drivers increases bus power and causes significant RC signal delay. Another drawback of using tri-state drivers as repeaters is the need for decoding circuitry for the drivers. This decoding circuitry is in addition to the decoding circuitry already required for the cache memory banks.
Therefore, what is needed is repeater circuit for a very large on-die cache memory which overcomes the problems and drawbacks associated with the use of conventional tri-state bus drivers.