1. Technical Field
This invention generally relates to cache circuits, and more specifically relates to a performance based system and method for dynamic allocation of a unified multiport cache.
2. Background Art
The major driving force behind computer system innovation has been the demand by consumers for faster and more powerful computers. One of the major hurdles for increasing the speed of the computer has historically been the speed with which data can be accessed from memory, often referred to as the memory access time. The microprocessor, with its relatively fast processor cycle times, has generally been delayed during main memory accesses to account for the relatively slow main memory access times. Accordingly, improvement in memory access times has been one of the major areas of research for increasing the speed of the computer.
One such development that has resulted from the research is the use of cache memory. A cache is a small amount of very fast and expensive memory that is used to store a copy of frequently accessed information. By combining the use of fast but expensive cache memory with the slower but cheaper main memory, the overall memory access can be significantly reduced yet the cost remain relatively low. When the processor requests data from main memory and the data resides in the cache, then a cache read hit takes place, and the data from the memory access can be returned to the processor from the cache with minimal wait states. If the data does not reside in the cache, then a cache read miss occurs. In a cache read miss, the memory request is forwarded to the system, and the data is retrieved from main memory, as would normally occur in a system not having a cache. On a cache miss, the data that is retrieved from main memory is provided to the processor and is also written into the cache according to the statistical likelihood that this data will be requested again by the processor.
Important considerations for determining cache performance are the organization of the cache and the cache management policies that are employed in the cache. In general, a cache can be organized into either a direct-mapped or set-associative configuration. In a direct-mapped organization, the physical address space of the computer is conceptually divided into a number of equal pages, with the page size equaling the size of the cache. The cache is partitioned into a number of sets, with each set having a certain number of lines. The line size is usually on the order of a magnitude of 16-128 bytes or more. Each one of the conceptual pages defined in main memory has a number of lines equivalent to the number of lines in the cache, and each line from a respective page in main memory corresponds to a similarly located line in the cache.
An important characteristic of a direct-mapped cache is that each memory line, from a conceptual page defined in main memory, can only reside in the equivalently located line or page offset in the cache. Due to this restriction, the cache only need refer to a certain number of the upper address bits of a memory address, referred to as a tag, in order to determine if a copy of the data from the respective memory address resides in the cache because the lower order address bits are pre-determined by the page offset of the memory address.
A set-associative cache includes a number of banks, or ways, of memory that are each equivalent in size to a conceptual page defined in main memory. Accordingly, a page offset in main memory can be mapped to a number of locations in the cache equal to the number of ways in the cache. For example, in a four-way set-associative cache, a line or page offset from main memory can reside in the equivalent page offset location in any of the four ways of the cache. As with a direct-mapped cache, each of the ways in a multiple way cache is partitioned into a number of sets each having a certain number of lines. In addition, a set-associative cache usually includes a replacement algorithm such as a Least Recently Used (LRU) algorithm, which determines which bank or way with which to fill data when a read miss occurs.
Cache management is usually performed by a device referred to as a cache controller. One such cache management duty performed by a cache controller is the management of processor writes to memory. Typically, the cache controller includes a directory which holds an associated entry for each set in the cache. In a write-through cache, this entry has at least two components: a tag and a tag valid bit. The tag acts as a main memory page number, and it holds the upper address bits of the particular page in main memory from which the copy of data residing in the respective set of the cache originated. The status of the tag valid bit determines whether the data in the respective set of the cache is considered valid or invalid. If the tag valid bit is clear, then the entire set is considered invalid. If, however, the tag valid bit is true, then an individual line within the set is considered valid or invalid depending on the status of its respective line valid bit.
In a write-back cache, the entries in the cache directory are comprised of a tag and a number of tag state bits for each of the lines in each set. As before, the tag comprises the upper address bits of the particular page in main memory from which the copy originated. The tag state bits determine the status of the data for a respective line, i.e., whether the data is invalid, exclusively owned and modified, exclusively owned and unmodified, or shared.
In the future, the ability to support multithreaded applications, or a number of processors, accessing the cache on the same cycle will become common. Since these threads/processors will be accessing different cache lines on the same cycle, serious bandwidth problems could exist in the standard unified cache of today.
Currently, there are two primary architectures for multiple processors. The first is for each processor to have a local cache; the second is for a secondary cache between the processors and the main memory. The latter type of cache is referred to as a xe2x80x9cunifiedxe2x80x9d cache. With a unified cache, a processor requesting data queries the secondary cache over a common memory bus after gaining control of that bus.
This scheme has several drawbacks. First, the main memory bus is shared between the processors, meaning that only a single request can be honored at a time. Also, multiple cache look ups can create a bottleneck. The individual processor may not be able to handle several lines of information at a time. Moreover, the amount of cache space set aside for each processor in these systems is usually fixed.
Thus, the most prevalent unified caches allow only single requests, can have cache access bottlenecks, and have fixed cache space per processor. What is needed is a system that solves these problems.
The preferred embodiments of the present invention provide a performance based system and method for dynamic allocation of a unified multiport cache. A multiport cache system is disclosed that allows multiple single-cycle look ups through a multiport tag and multiple single-cycle cache accesses from a multiport cache. Therefore, multiple processes, which could be processors, tasks, or threads can access the cache during any cycle. Moreover, the ways of the cache can be allocated to the different processes and then dynamically reallocated based on performance. Most preferably, a relational cache miss percentage is used to reallocate the ways, but other system metrics may also be used.