Circuit designers strive to increase the operational speed of processors by reducing the time that stored data and/or instructions are accessed from memory locations. One way to speed up a processor's access of stored code is to use cache memory for temporarily storing a duplicate copy of the code that the processor has recently retrieved from main memory. Since software programs tend to loop and thus access the same locations in memory over and over, it has been known in the art to incorporate some type of cache system in communication with the processor for making the needed code more quickly accessible. When the processor requests code that resides in the cache, which is a “cache hit”, the code can be retrieved much more quickly than if the code does not reside in the cache, which is a “cache miss”. For a cache miss, the processor is required to retrieve the code from main memory, which may take as much as one hundred times longer than accessing code from cache.
Cache is controlled by a cache controller that typically includes algorithms for determining which code to store. When new code is retrieved from main memory and is to be allocated into the cache, an allocation algorithm determines what existing code needs to be evicted from the cache. Allocation algorithms typically work under the concept of statistical probability to determine which code might likely be needed next. For example, a “round-robin” approach simply evicts code from the next location based on a predetermined order. A “least recently used” methodology is a more sophisticated approach, which keeps track of when each cache line is entered into cache and evicts the oldest code.
In response to the processor making a request for code, the cache controller checks to see if the code is already in the cache. This is done by comparing the address of the requested code with the addresses in cache. For example, the address of a 4K-byte cache typically includes 32 bits, where 20 bits (address [31:12]) are “tag” bits and 12 bits (address [11:0]) are “offset” bits. The tag bits, which are stored off into a separate tag cache, identify which one of 220 cache lines from main memory are stored in a given cache address. The 12 offset bits indicate where the code will be stored in the 212-byte (4K) cache.
The cache controller utilizes the offset bits depending on the architecture of the cache. For instance, one type of cache is configured as “direct map” cache. With direct map cache, code with a particular offset can be stored in only one location in the cache. For example, suppose that a 32 byte cache line having an address range from 0000—0700 (hex) to 0000—071F (hex) is stored in offset addresses 700 to 71F in the cache and suppose a second cache line has an address range from 1234—5700 to 1234—571F. This second cache line can not be stored simultaneously with the first cache line since the two cache lines have conflicting offset addresses 700 to 71F. It is likely that these conflicts occur simply because of the random nature in which code is compiled.
Because of this problem with direct map caches, it has been known to create cache having a “set-associative” architecture. A “two-way” set-associative cache, for example, includes a configuration where the entire cache is divided into two equally sized “cache ways”. An extra bit, typically taken from the most significant bit of the offset address, is used to indicate in which one of the two cache ways a particular cache line is stored. An algorithm can be used to interpret this extra bit to determine where the cache line is stored. Alternatively, the tag addresses can be compared to detect if a cache line is stored in the first or second cache way.
Cache can be divided into any number of parts. For instance, “four-way” set-associative caches having four cache ways and “eight-way” set-associative caches having eight cache ways are common. By allowing cache lines having conflicting offsets to be stored simultaneously in separate cache ways, the cache hit rate can be increased. It has been discovered that the hit rate with a direct-map cache might be a reasonable 65%. However, this hit rate can be raised to about 90% by using a four-way set-associative cache, thereby significantly increasing the processor access speed. Since code addresses tend to have conflicting offsets, it is usually better to use a cache configuration that has more cache ways.
However, taken to an extreme, the cache can be configured such that any cache line can be stored anywhere in the cache. This type of fully set-associative cache is typically referred to as a “content accessible memory” or “CAM”. However, for proper operation, CAM requires a large number of comparators and associated logic, which ultimately results in slower access times. It has therefore been discovered that four-way and eight-way set-associative caches, which only requires four or eight comparisons, respectively, normally provide the best code access performance.
Attention is drawn again to the allocation algorithms of cache controllers. Unfortunately, one of the limitations of industry standard allocation algorithms is that, while they provide statistically better performance on average, they may not necessarily provide optimal performance for system critical tasks at all times. Many tasks that a processor performs might not be time critical, while other tasks, such as “real-time” tasks, might require fast handling. As a result, software is typically written such that critical real-time tasks interrupt other tasks. Then, once the interrupt has completed, the software may return to the non-critical tasks to continue normal processing.
Interrupts or real-time tasks may not necessarily run very often, but when they are run, it is desirable that they be run quickly. As an example, suppose an interrupt routine is created for a cellular phone to generate a ringing signal when a call is received. Although this interrupt may only occur a few times a day, when it happens, it should be run quickly since it is a real time event. The phone may be running background tasks when a call is received, but the ringing interrupt takes priority. One solution for quickly running interrupts or other high-priority code has been to load this code in cache and then “lock” it in the cache so that it will not be evicted. This, of course, improves the latency of these interrupt routines. It is therefore beneficial to guarantee that access to the interrupt routine will hit in the cache in order to provide quick execution. Conventional allocation algorithms normally are not sophisticated enough, however, to know which code is high priority. Therefore, a software expert may recognize that certain portions of code should take priority over other code such that it can be locked and available in the cache at all times.
Another use of cache locking involves tasks that are run frequently. Suppose a software designer is aware of particular code that will normally be run a hundred times more often than most other tasks. To optimize the cache hit rate in this case, the software designer may choose to lock this frequently used code in the cache. Also from an analysis of the code and processor, it may be discovered that frequently used code is used in intervals that may put the code in a least recently used category for impending eviction. Evicting this code and then retrieving this code from main memory shortly thereafter would waste processor time in this case. Therefore, it would be beneficial to lock this code as well.
Conventional processors having locking functionality operate by locking entire cache ways. Although locking improves performance for interrupts and frequently used code, the access performance for the remaining code will likely go down. Suppose one has a 4K four-way cache, where each cache way stores 1K, and an interrupt routine of 600 bytes is to be locked. In this case, there are 424 unused locations in the locked cache way that are wasted. The software programmer could lock other code, but it may be difficult to determine what else to put in there. On the other hand, suppose that one has a 64K-byte four-way cache. Each cache way in this case would be 16K bytes. If one were to lock the same 600 bytes in one cache way, then over 15K bytes would be wasted. Since the trend in recent years has been to design with bigger caches, cache ways are consequently getting bigger as well. However, unnecessary locking wastes more space in the cache ways. Thus, a need exists in the industry to address the aforementioned deficiencies and inadequacies to minimize wasted cache way space.