1. Field of the Invention
This invention relates to the field of superscalar microprocessors and, more particularly, to storage devices within superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor preform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Since superscalar microprocessors execute multiple instructions per clock cycle and the clock cycle is short, a high bandwidth memory system is required to provide instructions and data to the superscalar microprocessor (i.e. a memory system that can provide a large number of bytes in a short period of time). Without a high bandwidth memory system, the microprocessor would spend a large number of clock cycles waiting for instructions or data to be provided, then would execute the received instructions and/or the instructions dependent upon the received data in a relatively small number of clock cycles. Overall performance would be degraded by the large number of idle clock cycles. However, superscalar microprocessors are ordinarily configured into computer systems with a large main memory composed of dynamic random access memory (DRAM) cells. DRAM cells are characterized by access times which are significantly longer than the clock cycle of modern superscalar microprocessors. Also, DRAM cells typically provide a relatively narrow output bus to convey the stored bytes to the superscalar microprocessor. Therefore, DRAM cells provide a memory system that provides a relatively small number of bytes in a relatively long period of time, and do not form a high bandwidth memory system.
Because superscalar microprocessors are typically not configured into a computer system with a memory system having sufficient bandwidth to continuously provide instructions and data, superscalar microprocessors are often configured with caches. Caches are storage devices containing multiple blocks of storage locations, configured on the same silicon substrate as the microprocessor or coupled nearby. The blocks of storage locations are used to hold previously fetched instruction or data bytes. The bytes can be transferred from the cache to the destination (a register or an instruction processing pipeline) quickly; commonly one or two clock cycles are required as opposed to a large number of clock cycles to transfer bytes from a DRAM main memory.
Caches may be organized into an "associative" structure. In an associative structure, the blocks of storage locations are accessed as a two-dimensional array having rows and columns. When a cache is searched for bytes residing at an address, a number of bits from the address are used as an "index" into the cache. The index selects a particular row within the two-dimensional array, and therefore the number of address bits required for the index is determined by the number of rows configured into the cache. The addresses associated with bytes stored in the multiple blocks of a row are examined to determine if any of the addresses stored in the row match the requested address. If a match is found, the access is said to be a "hit", and the cache provides the associated bytes. If a match is not found, the access is said to be a "miss". When a miss is detected, the bytes are transferred from the memory system into the cache. The addresses associated with bytes stored in the cache are also stored. These stored addresses are referred to as "tags" or "tag addresses".
The blocks of memory configured into a row form the columns of the row. Each block of memory is referred to as a "way"; multiple ways comprise a row. The way is selected by providing a way value to the cache. The way value is determined by examining the tags for a row and finding a match between one of the tags and the requested address. A cache designed with one way per row is referred to as a "direct-mapped cache". In a direct-mapped cache, the tag must be examined to determine if an access is a hit, but the tag examination is not required to select the which bytes are transferred to the outputs of the cache. Since only an index is required to select bytes from a direct-mapped cache, the direct-mapped cache is a "linear array" requiring only a single value to select a storage location within it.
Both direct-mapped and associative caches are employed in high frequency (i.e. short clock cycle) superscalar microprocessors. In high frequency applications, set associative caches become a clock cycle limiter because the comparison of tags to the request address and the subsequent selection of data bytes to convey to the output requires more time than the desired clock cycle time allows. Direct-mapped caches, which compare the selected tag to the request address in parallel with conveying data bytes to the output, operate in less time than the associative cache. Unfortunately, direct-mapped caches are associated with lower hit rates (i.e. the percentage of access that are hits) than associative caches with a similar storage capacity. Furthermore, direct-mapped caches are more susceptible to "thrashing". Thrashing is a phenomenon that occurs when the pattern of address requests presented to the cache contains several dissimilar addresses with the same index. Dissimilar addresses are addresses that are stored in the cache with different tags. As an illustrative example, addresses A and B may access the cache alternately and repeatedly. Address A and address B have the same index, and access a direct-mapped cache. First, address A accesses the cache and misses. The indexed cache storage location is filled with bytes associated with address A. Next, address B accesses the cache and misses. The indexed cache storage location discards the bytes associated with address A and is filled with bytes associated with address B. Address A accesses the cache again, and misses. The cache storage location discards the bytes associated with address B and is filled with bytes associated with address A. An associative cache would be able to store bytes associated with both address A and address B simultaneously. A storage device having the access time of a direct-mapped cache with the hit rate and insensitivity to thrashing of an associative cache is desired.