This invention relates to the field of cache memories for microprocessors, and particularly to an associative cache requiring a single bank of SRAM devices.
A significant barrier to improving the performance of a microprocessor system is the access time of system memory. Although the speed of semiconductor memories has improved over time, the speed of DRAM devices has not kept pace with the speed of the processors. Consequently, when executing most applications, a processor will experience numerous wait states while system memory is accessed. A frequently employed solution to this problem is the incorporation in the microprocessor system of a high-speed cache memory comprising SRAM devices. In general, a cached system will experience significantly fewer wait states than a non-cached system.
The simplest form of cache is generally referred to as a direct-mapped cache, wherein contents of the system memory are retrieved and stored in cache locations having the same low-order address. For example, if an 8K cache is provided, the thirteen lowest order address bits of the system memory location to be retrieved define the cache storage location. A significant disadvantage of a direct-mapped cache is that the cache contents will be overwritten whenever there is an access request to a system memory location having the same low order address but a different high order address.
To overcome this disadvantage, a set associative cache structure is sometimes used. For example, with a two-way set associative cache, the cache memory is physically divided into two banks of SRAMs. Thus, a two-way set associative 8K cache would comprise two 4K banks of SRAM. Data retrieved from system memory may be mapped into either one of the two banks since the two banks have identical low order addresses. A cache hit in one bank causes a least recently used (LRU) flag to be set for the corresponding address in the other bank. Thus, cache writes may be directed to the cache bank whose contents were least recently used, thereby preserving the more recently used data for subsequent accesses by the CPU. An associative cache significantly improves the cache hit rate and thus improves overall system performance.
Additional banks of SRAM may be added to create a four-way, eight-way, etc., associative cache. However, the increase in system performance with increased associativity is non-linear and it is generally felt that four-way associativity provides an optimal performance/cost tradeoff. Prior art cached systems incur significantly higher power consumption as the cache associativity is increased. Although total cache memory remains constant, a four-way associative cache consumes significantly more power than a direct-mapped cache since the power consumption of each SRAM device is not proportional to the size of the SRAM array. Furthermore, a four-way associative cache will require four times as many SRAM packages as a direct-mapped cache, thereby occupying more area on the processor circuit board.
One of the objects of the present invention is to implement an associative cache using a single bank of SRAM, thereby achieving the superior hit rate performance of an associative cache without incurring the component cost, power consumption and real estate penalties of prior art associative cache subsystems.
In the present invention, a cache controller is intimately associated with a microprocessor CPU on a single chip. The physical address bus is routed directly from the CPU to the cache controller where it is sent to the cache tag directory table. For a cache hit, the cache address is remapped to the proper cache set address. For a cache miss, the cache address is remapped in accordance with the LRU logic to direct the cache write to the least recently used set. The cache is thereby functionally divided into associative sets, but without the need to physically divide the cache into independent banks of SRAM.
Prior art associative caches cannot be implemented in a single bank of SRAM since there is no practical way to decode the cache tags prior to accessing data in the cache. While it would be possible to decode the tags first and then remap the cache address to the proper cache set, this would require at least one additional clock cycle, thereby defeating the very purpose of caching, or would require prohibitively fast SRAMS. In the present invention, however, the cache controller is co-located on the same chip as the CPU permitting access to the unbuffered address lines so that remapping of the cache address following interrogation of the cache tag directory table is transparent to the CPU.