The present invention is related to the field of cache memories: more particularly, to architectures of local cache memory embedded on the same silicon chip as a microprocessor.
Many processors manufactured today include one or more embedded first level caches. xe2x80x9cCachexe2x80x9d is the name generally given to the first level of memory storage in a memory hierarchy of a computer system. Caches operate on the principle of locality, by providing the processor with access to data that is frequently referenced. To put it another way, a cache reduces average memory access time when it is organized so that the code and the data the microprocessor needs most often is resident within the cache. The cache accomplishes this by storing code and data that the microprocessor has requested, and also storing code and data that the microprocessor is predicted to request.
In its simplest form, a cache has three basic components: a data cache array, a tag cache array, and cache management logic. Most often, the data and tag cache arrays are implemented with random access memory (RAM). The data cache RAM is a block of fast memory that stores copies of data or instructions frequently requested by the processor. Since the cache holds copies of data or instructions that are in the main system memory, it is necessary to know when a copy is available in the cache.
As information is copied into the data array, its main system memory addresses are also stored in the tag array. The tag array contains the original main system memory addresses of code or data stored in the data array, plus additional bits used by the cache management logic. As is well known, each directory entry in the tag array is called a xe2x80x9ctagxe2x80x9d. A xe2x80x9cblockxe2x80x9d refers to the minimal unit of information that can be present in the cache (i.e., a cache xe2x80x9chitxe2x80x9d) or not (i.e., a cache xe2x80x9cmissxe2x80x9d).
There are three basic categories of cache organization. In a direct-access cache, each block has only one place that it can appear within the cache. In a fully associative cache a block can go anywhere within the cache. A set associative cache is one in which the block can be placed in a restricted set of places in the cache. A group of blocks in the cache is referred to as a set. If there are N blocks in a set, the cache is called N-way, set associative.
A majority of processor caches today are either organized as direct mapped caches, two-way set associative, or four-way set associative caches. By way of example, the Intel Pentium(copyright) processors, including the Pentium(copyright), Pentium(copyright) Pro and Pentium(copyright) II processors include N-way set associative embedded first level caches.
One way to possibly improve performance in a new processor version is to increase the size of the first level caches. A cache can be enlarged by increasing its associativity (i.e., the number of ways), by increasing the number of sets, by increasing the cache line size, or by a combination of any of the above. There are well-known trade-offs that favor one option or another, dependent upon purpose, usage, and other considerations of the processor and memory hierarchy.
Generally speaking, however, the microarchitectures of many commercial processorsxe2x80x94such as those that utilize the Intel Architecture (IA)xe2x80x94impose an upper limit of 4 Kbytes to the size of each way, apparently prohibiting any viable implementation exceeding that size. The value of 4 Kbytes may be derived from the paging architecture of these associative processors. Other problems arise when the number of ways is increased beyond four. In other words, set associative caches having eight or sixteen ways create additional problems that adversely affect overall processor performance. Therefore increasing the numbers of ways beyond four is not always considered to be a viable alternative.
Likewise, for architectural reasons it is often undesirable to increase the cache line size.
Thus, there exists a need in the microprocessor field for a novel implementation of an embedded cache with an increased way size to improve processor performance.
A cache memory is provided, which, in one embodiment, comprises a tag array that is split into first and second halves. Each of the first and second halves has N ways. The first half of the tag array is used to store upper M sets, and the second half of the tag array to store lower M sets. Lower order address bits are utilized to read both the first and second halves of the tag array in a first phase of a clock cycle. Comparison circuitry is coupled to the first and second halves of the tag array. The comparison circuitry compares each of the N ways read out of both the first and second halves of the tag array with higher order physical address bits. The output of the comparison circuitry is coupled to select circuitry to select a set of way select signals. This selection is based on at least one bit of the higher order address bits.
The cache memory also includes a data array having N ways and 2M sets. The lower order address bits, in combination with the at least one bit, is used to access the data array. The set of way select signals outputs data of a correct way.