This invention relates to a method of cache locking on a bank-by-bank basis in a cache implementing a xe2x80x9cleast recently usedxe2x80x9d cache replacement scheme, and to a cache implementing such a method.
A popular type of cache is a set-associative cache, shown in FIG. 1. Cache 100 includes several cache banks 102, and each bank 102 has a TagRAM section 106, a StateRAM section 108, and a DataRAM section 104. Each cache bank has a number of addressable memory locations (typically, a power of 2), which are divided into a number of blocks. Illustratively in FIG. 1, the cache banks contain 16 memory locations each, each bank has 8 blocks, and each block corresponds to 2 of the cache bank""s memory locations. TagRAM 106 locations (marked xe2x80x9cTxe2x80x9d) store some of the more significant bits of a main memory address that corresponds to the information in the block""s DataRam 104 (marked xe2x80x9cDxe2x80x9d), and the StateRAM 108 (marked xe2x80x9cSxe2x80x9d) indicates whether the information in the associated cache block is valid. Illustratively, if a given cache bank address contains 128 bits, or four 32 bits words, and 2 cache addresses form a block, then each block contains eight 32 bit words. Storing 8 words in a block accounts for the three least significant bits of the main memory addresses. If the employed cache banks have 16 addresses and 2 cache addresses form a block, then the main memory can be divided into 8 sets of addresses, with addresses from each set being cached in designated blocks of the cache. This accounts for three more bits of the main memory addresses.
FIG. 1 includes three cache banks, which make cache 100 a 3-way set-associative cache. A CPU 300 that wishes to obtain information (instructions or data) provides an address to control module 200, and control module 200, in turn, provides the address to cache 100. In response to the received address, the cache determines whether information corresponding to the provided address is found in the cache (i.e. a match is found in the TagRam). If so, the cache raises the xe2x80x9chitxe2x80x9d signal of line 111, places the identity of the bank where the hit occurred on line 112, and provides the retrieved information to control module 200. Control module 200 then provides the information to CPU 300.
Lines 111 and 112 are applied to replacement unit (RU) 150 which updates its information regarding the latest xe2x80x9chitxe2x80x9d and provides a signal on line 113 to control module 200 to assist control module 200 in determining the bank into which new information should be written, if necessary.
When cache 100 determines that there is a xe2x80x9cmiss,xe2x80x9d that information is communicated to control module 200 via line 111, which causes control module 200 to issue a Stall command to CPU 300 and to issue a request for information to memory 500. This request is processed via memory interface circuitry 400. Information is responsively fetched from main memory 500, is provided to control module 200 (via unit 400), and control module 200 writes the information into cache 100. Concurrently, control module 200 can provide the fetched information to CPU 300. The writing to cache 100 is executed by control module 200, as indicated above, based on information provided to control module 200 by replacement unit 150 on line 113. While the information is stored in the DataRam of cache 100, the TagRAM and StateRAM are written to indicate that the corresponding address is now in cache and that the information is valid.
During a lookup of a set-associative partitioned cache, each cache-bank 102 selects one entry based on the sought address. The TagRAM for that entry is compared with the appropriate bits in the sought address, and if any of the comparisons match, a xe2x80x9chitxe2x80x9d is declared, provided that StateRAM 108 for that entry indicates that the entry is valid.
Several schemes are known for determining the replacement bank. One is the least recently used (LRU) policy. Another one is the round robin replacement policy. Replacement unit (RU) 150 contains the hardware that determines the replacement bank, i.e., the bank into which information is written, and it uses the history of cache accesses to compute the replacement bank.
The least recently used (LRU) replacement policy uses the notion that the cache entry that has not been used for the longest time is the least likely to be used again. Consequently, it is a good choice to be replaced and is so designated. Since the access patterns differ on different lines, there is a separate LRU bank computed for each cache line. In order to keep track of the LRU index for each line, a history of accesses must be maintained.
In order to improve the performance of programs, a technique called xe2x80x9ccache lockingxe2x80x9d has been proposed. Frequently used instructions or time-critical program segments are locked in the cache. That avoids the penalty of a cache-miss when the program references those instructions. Conversely, by locking some parts of the cache, the cache size available to non-critical program segments is reduced, which can slow them. That tradeoff must be studied for particular intended uses of the cache in order to determine whether locking is advantageous.
Several techniques have been proposed to accomplish Cache Locking. U.S. Pat. No.5,353,425 describes an approach that implements a pseudo LRU by using a set of bits, called the MRU (Most Recently Used) bits, for each direct-mapped address. When an entry is loaded in cache, the corresponding MRU bit is set. The replacement Unit finds a bank that does not have the MRU bit set and uses it as the Replacement Bank. If all the bits are set then it is a deadlock condition since there is no place for the incoming cache-line to be stored. The replacement Unit determines this and resets all the MRU bits except the last one to be loaded. This erases the history of accesses and for this reason it is called a Pseudo LRU scheme. However, by providing an additional bit (called a LOCK bit) for every MRU bit, that technique has the ability to implement cache locking. The LOCK bits are combined with the MRU bits to ensure that the locked line is never selected for replacement. When the cache-line is no longer going to be frequently accessed, the corresponding LOCK bit can be reset, which allows the cache-line to participate in the replacement.
U.S. Pat. No. 5,493,667 describes an arrangement with a true LRU replacement policy. The replacement computation uses a store of bank identifiers for each cache line. That memory uses the history of accesses and orders the banks according to the time since their last use. The technique works as follows. The cache bank that is the most recently used (MRU) is inserted into the right-most column of the memory. All the entries are shifted to the left, stopping at the entry that matches the current MRU index. This shifting operation results in the left-most entry being the least recently used (LRU) bank. When the cache misses, that LRU entry points to the bank that will receive the incoming instructions. When these instructions are loaded into the cache, this LRU index becomes the MRU and is inserted into the right-most column, resulting in a shift of the entries and a new LRU index being computed for the cache line. In order to perform cache locking, a lock signal is asserted. When that signal is asserted, the MRU index is inserted not into the right-most column, but inserted into the column on its left. That causes the cache bank indexed with the right-most column to be always resident in cache. One may view the locking as treating the right-most column as fixed and the remaining N-1 columns as an N-1 way set-associative cache as long as the lock signal is asserted.
In order to use that technique, it is envisioned that the lock signal is not asserted during the first-pass of the time-critical program segment. That causes the critical code to be loaded into the cache and the right-most column of the LRU storing the location of that code fragment. During subsequent iterations of the critical program segment, the lock signal is asserted, which results in the critical code being cache resident. When no longer required, the lock signal is de-asserted, and the critical code can eventually be replaced as per the LRU policy.
A drawback of the arrangement described by the U.S. Pat. No. 5,493,667 patent is that only one cache bank can be locked at any time. That limits the amount of locked instructions to the size of a one-cache bank. In fact, the limit is more stringent in that only I-cache line is that maps to the same direct-mapped address can be locked. If several critical portions of code need to be locked, one has to ensure that they do not overlap, and no satisfactory method for achieving that is given. Also, since the lock signal is asserted and later de-asserted, it is important that the locking of the second-code segment not remove some of the locked code from the first segment. Another limitation appears when the critical code is already in cache and not in the right-most column. This can happen when the address of the critical code is near some previously executed code. The critical code may already be in cache when the locking commences. In such an event, the critical code will not be fetched from external memory, and it will not be in the right-most column. Thus, setting the lock bit will not guarantee that the critical code is resident.
Disadvantages of prior art arrangements are overcome, and an advance in the art is achieved by the use of a bypass register to modify the LRU computation. The bypass register has a number of entries that is equal to the number of banks in cache 100. The LRU computation places the MRU entry into the right-most column that is not locked. The entries are then shifted left, skipping over bypassed banks. The result is that the left-most non-locked entry is the LRU bank.
One may appreciate that if a cache bank in the bypass register is not shifted to the left while other entries are shifted to the left, then the bypassed column is assured of not becoming the bank in left-most column and, hence, not becoming the LRU bank.
The present invention enables the programmer to lock (and unlock) any cache bank at any time. Furthermore, the present invention allows programmers to lock multiple cache banks at the same time. Therefore, the present invention allows programmers to lock subroutines that do not fit into a single cache bank by locking the subroutines into more than one bank. It also enables programmers to lock multiple routines into a single bank.