Throughout the development of computer systems, a primary emphasis has been on increasing the speed of such systems and their ability to handle larger and more complicated programs while reducing their cost. In order to increase the ability of a computer system, it is necessary to both increase the size of the random access memory (RAM) so its larger programs may be utilized by the computer system and to increase the speed at which access to that RAM is afforded. The straight forward method of increasing access speed is to use components which operate more quickly. However, such rapidly-operating components are more expensive than slower memory components.
With the cost involved in providing high speed RAM, advanced computer systems have used high-speed cache memory arrangement to increase the operational speed of the memory system. A cache memory arrangement provides a small portion of a specially fast memory and digital to the regular RAM. These commands are issued and data is utilized, the information is called from the RAM and stored in this cache memory. As each new read and write command is issued, the system looks to the fast memory cache to determine if the information is stored in the cache. If the information is available in the cache memory, access to the RAM is not required and the command may be processed or the data accessed much more readily. If the information is not available in the cache memory, the new data can be copied from the main memory and stored in the cache memory or it can be accessed and remains for later use by the system. In well-designed memory systems, the information slot lies in the cache memory over 90% of the time, on average. Consequently, use of the cache memory substantially speeds the overall operation of the memory utilized in the computer system.
In order to further enhance the speed of operation of the computer system, it has been found desirable to directly associate a small portion of extremely rapid cache memory directly on a processor chip. For example, it may be useful to provide such a small fast cache memory consisting of 8 kilobytes of memory directly on the chip with the other elements of a CPU. Such an arrangement is capable of greatly increasing the speed of the operation of the system to a great degree for information which is used repeatedly by various processes.
Today, cache memories are commonly designed at two levels: a first level (L1) cache and a second level (L2) cache. An L1 cache is a single layer of high speed memory between a microprocessor and main system dynamic RAM (DRAM) memory. L1 caches hold copies of code and data most frequently requested by the microprocessor and are typically small ranging from 4 kilobytes to 64 kilobytes in size. The L2 cache, on the other hand, is a second layer of high speed memory between the L1 cache and the main system DRAM memory. L2 caches also hold copies of code and data frequently requested by the microprocessor. The L2 cache handles the more random memory request that the L1 cache misses. In order to simplify the handling of requests that the L1 cache misses, the L2 cache typically includes all the data of the L1 cache and more. As a result, a L2 cache is almost always larger than a L1 cache ranging in size typically from 64 kilobytes to 512 kilobytes.
The performance of a cache is affected by the organization of the cache. Typically, there are three types of organizations that are most commonly used. These are fully associative, set associative and direct mapped (one-way set associative). In a fully associative cache memory, each item of information from the main memory system is stored as a unique cache entry. There is no relationship between the location of the information in the data cache RAM memory and its original location in the main system memory. If there are x storage locations in the cache, the cache will remember the last x main system memory locations accessed by the microprocessor. With a fully associative cache, the location of each store can hold information from any location in the main system memory. As a result, the cache requires complex tag entries (to map the complete main memory system memory space), resulting in very complex and expensive cache comparison logic. Set associative cache organizations divide the data cache RAM into banks of memory, or "ways". A 2-way set associative cache divides the data cache RAM into two ways, a 4-way set associative cache into four ways, and so on. The set associative cache separates main system memory into pages, where each page is equal in size to the size of a way. For example, a 64 k-bit 2-way set associative cache would logical see main memory as a collection of 32 k-bytes pages, equal in size to each way. Each location in a memory page can map only to the same location in a cache way. For example, in a 2-way set associative cache memory, each location in the main system memory page can map in the same location of either of the two cache way locations in the cache. When the microprocessor makes a memory request, the set associative cache compares the memory request with the tag entry at the page location in each of its ways to determine if the information is in the cache (i.e., a hit). This means the cache has to do one comparison for each way, for a total number of comparisons equal to the number of ways. For example, in a 2-way set associative cache memory, the cache would only have to make two parallel comparisons to determine if the information requested is stored in the cache.
A direct mapped (1-way set associative cache organization) uses the entire data cache RAM as one bank of memory or way. The main system memory of the cache is logically separated into pages, where each page is the size of a data cache RAM. Each location in any main system memory page directly maps only into the same location in the data cache RAM.
In prior art, a separate cache controller is used to provide access to the L2 cache. The cache controller is separate from the processor in the computer system, usually as a separate computer chip. The cache controller is a very complicated logic. Most processors systems contain two such controllers, one to control the L1 cache within the processor and the other to control the L2 cache in the system. The design of these two controllers is a compromise between performance and complexity of state that must be shared between them. The system of such hierarchical caches would provide the highest overall performance if the two cache controllers had access to information of both the caches and the processor and bus accesses. This is clearly not possible when the cache controller for the L2 cache lies in a separate package.
Another problem with the prior art is that the L2 cache is on the system bus and access to the L2 cache is limited to the speed of the system bus. For instance, if the system bus is running at 10 MHz, an access to the L2 cache can not be performed faster than 10 MHz. It would be advantageous for the processor to be able to access the L2 cache at a rate faster than that of the system bus in order to increase the overall speed of the system.
Moreover, the use of different cache memory arrangements may be advantageous for different applications. Therefore, it is desirable to have a processor that may operate with multiple types of cache organizations, including the option of operating without a cache (if so desired). Therefore, as the different organizations are upgraded in the future, the microprocessor may not have to undergo any changes itself.