Contemporary high-speed computer system designs make use of a series of different memories, arranged in a hierarchy, to provide the system's central processing unit with needed memory read (retrieval) and write (update) capabilities. As shown in FIG. 1 a typical memory hierarchy 100 comprises four levels of memory: level 1, internal cache memory 110; level 2, external cache memory 115; level 3, main memory 120; and level 4, long term memory 125.
Level 1's internal cache memory 110 provides the fastest performance but is also the most severely space restricted because of its placement internal (i.e., on the same chip) as the computer system's central processor unit 105. Because of this space limitation, internal cache's are used to store only those instructions and data most commonly used by the processor. Typical internal cache sizes vary from 8 KB (kilobyte) to 32 KB.
Secondary or level 2 cache memory 115 is provided to increase computer system performance when the processor 105 requires information not stored in the internal cache 105. Sizes for level 2 caches vary greatly depending upon system requirements such as operational speed and architecture (e.g., pipelined versus non-pipelined).
Level 3 memory is the largest portion of RAM (random access memory) and is typically implemented in DRAM (dynamic random access memory) technology. A key feature of level 3 memory is that it is significantly slower than level 2 cache memory. Main memory is characterized by the need to implement wait-states between memory read or write commands and the actual reading (retrieval) or writing of the specified memory locations. Typical main memory sizes range from 64 KB to 512 MB (megabyte).
Level 4 memory is also referred to as long term or permanent storage and is typically implemented using magnetic or optical hard disks or, magnetic floppy disks.
2.1 Burst SRAM Devices
New medium and high end personal computers, as well as most if not all workstation and main-frame computers, utilize level 2 external cache memory to obtain a reasonable level of performance. A current trend in computer memory system design is to implement level 2 cache memory in synchronous burst SRAM technology. As would be known to one of ordinary skill in the field of memory system design, the goal of burst SRAM devices is to provide a large array of fast static random access memory with on-chip control circuitry to automatically retrieve a limited sequence of consecutive memory read addresses. For example a four-word burst SRAM, given read address X, would not only retrieve the data at memory location X, but would also automatically retrieve those data at memory locations (X+1), (X+2), and (X+3) without the host processor having to generate any of the read addresses (X+1), (X+2), or (X+3). (One of ordinary skill in the art will realize that different processors can generate different burst sequences.) Similarly, a four-word burst SRAM is capable of receiving and storing four data words from the processor 105 in rapid succession.
By way of illustration, the Micron MT58LC32K36B4 (a 32K.times.36 SRAM having a two-bit burst controller) synchronous SRAM will be described. As shown in FIG. 2, the basic elements of the burst SRAM device 200 consist of an input address data latch 205, a burst counter 210, the 32K.times.36 SRAM array 215, SRAM array output register 220, output buffer 225, and appropriate read/write control logic 230.
During a read operation, external address input A0-A14 are captured by address latch 205. Low order address bits A0 and A1 are then split off (indicated as LA0 and LA1 in FIG. 2) and supplied as input to the two-bit burst counter 210. The function of the burst counter is to generate an appropriate sequence of address bit values on lines IA0 and IA1 to step-through four consecutive memory locations. The precise sequence of values on lines IA0 and IA1 are dependent upon the memory's operation and is controlled by an externally applied signal (not shown). Typically, the burst counter can generate a sequence of IA0 and IA1 bit patterns to implement either an interleaved or linear-burst address scheme. As would be known to those of ordinary skill, the former mode is suitable for use in a computer system using an Intel "PENTIUM" processor while the latter is suitable for use with non-Intel processors.
Address latch 205 output (IA2-IA14, representing input address bits A2 through A14 respectfully) is applied, in combination with burst counter 210 output IA0 and IA1, to the SRAM array 215. Output from the SRAM array is supplied to an output register 220. In turn, output from the output register is applied to an output buffer 225 where it is supplied to an external data bus.
Data write operations are performed in a similar manner. An external address is stored into address latch 205 whose output, in combination with burst counter 210 output, is applied to the SRAM array 215. Externally supplied data is then stored into the SRAM array under control of read/write control logic 230.
2.2 Associative Memory Caches
It is widely known that implementing a cache using associative access techniques improves the performance of a memory. In fact, many systems employ a two-way set associative memory scheme in their level 2 external cache 115. It is also known that increasing the associativity (e.g., from two-way to four-way) of the external cache memory can result in significantly higher system performance. This is especially true in systems with small internal caches (e.g., 110) which execute multi-threaded operating systems such as Unix or Microsoft "WINDOWS" because of paging mechanisms as the processor jumps from task to task.
The reason for this increased performance, when using a set associative cache memory 115, is the reduction of cache line replacements. For example, in a computer system employing an external direct-mapped cache, cache lines may have to be replaced for any two memory accesses a multiple of the cache size apart. In general, for an n-way set associative cache of size m, cache lines may have to be replaced for the (m+1)th memory access a distance of (m.div.n) memory locations (e.g., bytes) apart. This reduction in cache line replacements results in an increase in system performance. Furthermore, if an LRU (least recently used) replacement algorithm is used to determine which line should be replaced, it is likely that the data most likely to be accessed will still be in the cache in the next cache access.
Current pipelined and burst SRAMs are generally designed to allow bursts of four words into, and out of, the memory. (See FIG. 2 and discussion above.) Using these devices a two-way set associative cache memory can be implemented by doubling the number of memory devices; one physical device is used to implement one set of a two-way set associative memory, and another physical device is used to implement the second set. If Micron MT58LC32K36B4 memory devices are used in such a scheme, the resulting cache memory would have a 32K.times.36 depth (for each set), for a total memory of 128 KB. Implementation of a four-way set associative memory would require four such memory devices. While the increased memory size can improve system performance, the space (i.e., printed circuit board space) needed to implement such a memory as well as the increased power necessary to drive such a system are drawbacks to the current method of implementing set associative memory caches.
The memory device described herein modifies existing SRAM devices to allow multi-way set associative memory operations within a single memory device. For example, a Micron MT58LC32K36B4 modified in accordance with the invention can implement a two-way set associative memory with each set having a depth of 16 K-words (KW). Alternatively, a four-way set associative memory could be implemented in a modified MT58LC32K36B4 with each set having a depth of 8 KW.