1. Field of the Invention
The present invention relates to a method and structure for implementing a memory system. More specifically, the invention relates to a second level cache memory.
2. Description of the Prior Art
High-speed computer systems frequently use fast, small-capacity cache (buffer) memory to transmit signals between a fast processor and a slow (and low cost), large-capacity main memory. Cache memory is typically used to temporarily store data which has a high probability of being selected next by the processor. By storing this high probability data in a fast cache memory, the average speed of data access for the computer system is increased. Thus, cache memory is a cost effective way to boost system performance (as compared to using all high speed, expensive memories) In more advanced computer systems, there are multiple levels (usually two levels) of cache memory. The first level cache memory, typically having a storage of 4 Kbytes to 32 Kbytes, is ultra-fast, and is usually integrated on the same chip with the processor. The first level cache is faster because it is integrated with the processor and therefore avoids any delay associated with transmitting signals to and receiving signals from an external chip. The second level cache is usually located on a different chip than the processor, and has a larger capacity, usually from 64 Kbytes to 1024 Kbytes.
FIG. J. is a block diagram of a prior art computer system 100 using an SRAM second level cache configuration. The CPU or microprocessor 101 incorporates on-chip SRAM first level cache 102 to support the very fast internal CPU operations (typically from 33 Mhz to 150 Mhz).
First level cache 102 typically has a capacity of 4 Kbytes to 32 Kbytes and performs very high speed data and instruction accesses (typically with S to 15 ns). For first-level cache miss or other non-cacheable memory accesses, the memory read and write operations must go off-chip through the much slower external CPU bus 104 (typically from 25 Mhz to 60 Mhz) to the SRAM second level (L2) cache 106 (typically with 128 Kbytes to 1024 Kbytes capacity) with the additional latency (access time) penalty of round-trip off-chip delay.
The need for CPU 101 to manage the delay penalty of off-chip operation dictates that in almost all modern microprocessors, the fastest access cycle (read or write) through the CPU bus 104 is 2-1-1-1. That is, the first external access will consume at least 2 clock cycles, and each subsequent external access will consume a single clock cycle. At higher CPU bus frequencies, the fastest first external access may take 3 or more clock cycles. A burst cycle having 4 accesses is mentioned here for purposes of illustration only. Some processors allow shorter (e.g., 2) or longer (e.g., 8 or more) burst cycles. Pipelined operation, where the parameters of the first external access of the second burst cycle are latched into CPU bus devices while the first burst cycle is still in progress, may hide the longer access latency for the first external access of the second burst cycle. Thus, the first and second access cycles may be 2-1-1-1, 1-1-1-1, respectively.
The cache tag memory 108 is usually relative small (from 8 Kbytes to 32 Kbytes) and fast (typically from 10 to 15 ns) and is implemented using SRAM cells. Cache tag memory 108 stores the addresses of the cache lines of second level cache 106 and compares these addresses with an access address on CPU bus 104 to determine if a cache hit has occurred. This small cache tag memory 108 can be integrated with the system logic controller chip 110 for better speed and lower cost. An integrated cache tag memory operates in the same manner as an external cache tag memory. Intel's 82430 PCI set for the Pentium processor is one example of a logic controller chip 110 which utilizes an SRAM integrated cache tag memory.
One reason for the slower operating frequency of CPU bus 104 is the significant loading caused by the devices attached to CPU bus 104. Second level (L2) SRAM cache memory 106 provides loading on the data and address buses (through latch 112) of CPU bus 104. Cache tag memory 108 provides loading on the address bus, system logic controller chip 110 provides loading on the control, data and address buses, and main memory DRAM 114 provides loading on the data bus (through latch 116).
In prior art computer system 100, the system logic chip 110 provides an interface to a system (local) bus 118 having a typical operating frequency of 25 Mhz to 33 Mhz. System bus 118 may be attached to a variety of relatively fast devices 120 (such as graphics, video, communication, or fast disk drive subsystems). System bus 118 can also be connected to a bridge or buffer device 122 for connecting to a general purpose (slower) extension bus 124 (at 4 Mhz to 16 Mhz operating frequency) that may have many peripheral devices (not shown) attached to it.
Traditional high speed cache systems, whether first level or second level, are implemented using static random access memories (SRAMs) because the SRAMs are fast (with access times ranging from 7 to 25 nanoseconds (ns) and cycle times equal to access times). SRAMs are suitable for storing and retrieving data from high-speed microprocessors having bus speeds of 25 to 100 megahertz. Traditional dynamic random access memories (DRAMs), are less expensive than SRAMs on a per bit basis because DRAM has a much smaller cell size. For example, a DRAM cell is typically one quarter of the size of an SRAM cell using comparable lithography rules. DRAMs are generally not considered to be suitable for high speed operation because DRAM accesses inherently require a two-step process having access times ranging from 50 to 120 ns and cycle times ranging from 90 to 200 ns.
Access speed is a relative measurement. That is, while DRAMs are slower than SRAMs, they are much faster than other earlier-era memory devices such as ferrite core and charge-coupled devices (CCD). As a result, DRAM could theoretically be used as a “cache” memory in systems which use these slower memory devices as a “main memory”. The operation modes and access methods, however, are different from the operation modes and access methods disclosed herein.
In most computer systems, the second level cache operates in a fixed and rigid mode. That is, any read or write access to the second level cache is of a few constant sizes (line sizes of the first and second level caches) and is usually in a burst sequence of 4 or 8 words (i.e., consecutive reads or writes of 4 or 8 words) or in a single access (i.e., one word). These types of accesses allow standard SRAMs to be modified to allow these SRAMs to meet the timing requirements of very high speed processor buses. One such example is the burst or synchronous SRAM, which incorporates an internal counter and a memory clock to increment an initial access address. External addresses are not required after the first access, thereby allowing the SRAM to operate faster after the first access is performed. The synchronous SRAM may also have special logic to provide preset address sequences, such as Intel's interleaved address sequence. Such performance enhancement, however, does not reduce the cost of using SRAM cells to store memory bits.
Synchronous DRAMs (SDRAM) have adopted similar burst-mode operation. Video RAMs (VRAM) have adopted the serial port operation of dual-port DRAMs. These new DRAMs are still not suitable for second level cache operation, however, because their initial access time and random access cycle time remain much slower than necessary.
It would therefore be desirable to have a structure and method which enables DRAM memory to be used as a second level cache memory.
Prior art computer systems have also included multiple levels of SRAM cache memory integrated on the same chip as the CPU. For example, DEC's Alpha 21164 processor integrates 16 Kbytes of first level SRAM cache memory and 96 Kbytes of second level SRAM memory on the same chip. In such cases, a third level SRAM cache is typically used between the processor and a DRAM main memory. In such a computer system, it would be desirable to use a DRAM memory to replace the third level SRAM cache memory.
Prior art high-performance second level SRAM cache memory devices generally conform to a set of pin and function specifications to assure that system logic controller 110 may operate compatibly with a variety of different SRAM cache memories from multiple suppliers. Several examples of such pin and function specifications are set forth in the following references: “Pentiumm™” Processor 3.3V Pipelined BSRAM Specification”, Version 1.2, Intel Corporation, Oct. 5, 1994; “32K×32 CacheRAM™ Pipelined/Flow Through Outputs Burst Counter, & Self-Timed Write—For Pentium™/PowerPC™ Processors”, Advance Information IDT71V432, Integrated Device Technology, Inc., May 1994; and “32K×32 CacheRAM™ Burst Counter & Self-Timed Write—For the Pentium™ Processor”, Preliminary IDT71420, Integrated Device Technology, Inc., May 1994.
It is therefore desirable to have a method and structure which enables DRAM memory to be used as a second level cache memory which can be interfaced to a conventional logic controller which normally controls a second level SRAM cache memory. It is further desirable to have such a method and structure which requires minimal modification to the conventional logic controller.