Cache memory has long been used in data processing systems to decrease the memory access time for the central processing unit (CPU) thereof. A cache memory is typically a relatively high speed, relatively small memory in which active portions of program instructions and/or data are placed. The cache memory is typically faster than main memory by a factor of 5 to 10 and typically approaches the speed of the CPU itself. By keeping the most frequently accessed instructions and/or data in the high speed cache memory, the average memory access time will approach the access time of the cache.
The active program instructions and data may be kept in a cache memory by utilizing the phenomenon known as "locality of reference". The locality of reference phenomenon recognizes that most computer program instruction processing proceeds in a sequential fashion with multiple loops, and with the CPU repeatedly to a set of instructions in a particular localized area of memory. Thus, loops and subroutines tend to localize the references to memory for fetching instructions. Similarly, memory references to data also tend to be localized, because table lookup routines or other iterative routines typically repeatedly refer to a small portion of memory.
In view of the phenomenon of locality of reference, a small, high speed cache memory may be provided for storing a block of data and/or instructions from main memory which are presently being processed. Although the cache is only a small fraction of the size of main memory, a large fraction of memory requests will locate data or instructions in the cache memory because of the locality of reference property of programs. In a CPU which has a relatively small, relatively high speed cache memory and a relatively large, relatively low speed main memory, the CPU examines the cache when a memory access instruction is processed. If the desired word (data or program instruction) is found in cache, it is read from the cache. If the word is not found in cache, the main memory is accessed to read that word, and a block of words containing that word is transferred from main memory to cache memory. Accordingly, future references to memory are likely to find the required words in the cache memory because of the locality of reference property.
The performance of cache memory is frequently measured in terms of a "hit ratio". When the CPU references memory and finds the word in cache, it produces a "hit". If the word is not found in cache, then it is in main memory and it counts as a "miss". The ratio of the number of hits divided by the total CPU references to memory (i.e. hits plus misses) is the hit ratio. Experimental data obtained by running representative programs has indicated that hit ratios of 0.9 (90%) or higher are needed to justify the search time to determine a hit or miss because the search time is added to the normal memory access time in the case of a miss. With such high hit ratios, the memory access time of the overall data processing system approaches the memory access time of the cache memory, and may improve the memory access time of main memory by a factor of 5 to 10 or more. Accordingly, the average memory access time of the data processing system can be improved considerably by the use of a cache.
Data processing systems are typically used to perform many independent tasks. When a task is first begun, the hit ratio of the cache is typically low because the instructions and/or data to be processed will not be found in the cache. Such a cache is known as a "cold" cache. Then, as processing of a task continues, more and more of the instructions and/or data which are needed may be found in the cache. The cache is then referred to as a "warm" cache because the hit ratio becomes very high.
In order to maximize the hit ratio, many data processing system architectures allow system control over the use of the cache. For example, the cache may be used to store instructions only, data only, or both instructions and data. Similarly, the cache may be controlled to lock a particular line or page in the cache, without allowing overwrites. The design and operation of cache memory in a data processing architecture is described in detail in Chapter 12 of the textbook entitled "Computer System Architecture") by Mano, Prentice-Hall, Inc. (Second Edition, 1982). FIGS. 1-4, described below, are adapted from Chapter 12 of Mano.
Various techniques are known for mapping blocks of main memory into cache memory. Typical forms of mapping include direct, 2-way, and 4-way mapping. The form of mapping can impact the performance of the cache.
One method of mapping main memory and cache addresses is direct mapping. An example of direct mapping will now be discussed with reference to FIG. 1. The numeric values are in octal representation, i.e. one octal digit represents three bits. In FIG. 1, a 15-bit (five octal digit) main memory address is divided into two fields, an index field comprised of the nine least significant bits (three octal digits) and a tag field comprised of the remaining six bits (two octal digits). The entire 15-bit address, i.e. tag and index bits combined, is needed to access main memory, while only the 9-bit index is needed to access cache memory. The general case provides 2.sup.k words in cache memory and 2.sup.n words in main memory, wherein the n-bit main memory address is divisible into two fields, a k-bit index field and a n-k bit tag field. Direct mapping cache uses the n-bit address formed by combining the k-bit index field and the n-k bit tag field to access main memory and the k-bit index to access cache memory.
Referring to FIG. 2, each word stored in cache consists of a data word and its associated tag. It will be understood by those having skill in the art that program instructions, data, or both program instructions and data can be stored in cache memory. For purposes of simplification, it is assumed that data is stored in cache. A new data word is stored in cache by storing the data and the associated tag. In this example, the 12-bit data word is represented by four octal digits and the associated 6-bit tag is represented by two octal digits. Cache memory is accessed using the k-bit index field of the main memory address to address into cache. The n-k bit tag field of the main memory address is then compared with the tag associated with the word stored at the cache memory location identified by the k-bit cache address. If the two tags match, a hit results and the data word is in cache. In the event the two tags do not match, a miss results and the required data word must be read from main memory and stored in cache memory together with the associated tag using an appropriate replacement algorithm.
Still referring to FIG. 2, a specific example of direct mapping will now be described. The address, data and tag values are represented in octal representation. A word having a value of 1220 is stored at main memory address 00000 and is also stored in cache memory at index (cache) address 000. Tag 00 is associated with data 1220 and stored in cache memory. The CPU desires to access the data stored at main memory address 02000 and the index (cache) address 000 corresponding to main memory address 02000 is used to access cache memory. Tag 00 associated with the data word stored at cache address 000 is then compared to tag 02 of main memory address 02000. Since the two tags are not equal, a miss results and main memory must be accessed. The data word 5670 is then accessed by the CPU at main memory address 02000 and the data word and associated tag, in this case data 5670 and tag 02, are ultimately stored in cache memory at a cache memory address selected by an appropriate replacement algorithm.
Referring to FIG. 3, fully associative mapping for cache memory will now be described. All numeric values are in octal representation for this description of associative mapping. The 15-bit CPU address is stored in an argument register. The associative cache memory stores both the main memory address and the data word, thus allowing a word from main memory to be stored in any location in cache. In operation, a 15-bit main memory address is loaded into the argument register and the contents of the argument register is compared with the main memory addresses stored in associative cache memory. If the contents of the argument register equals one of the main memory addresses stored in cache memory, the 12-bit (four octal digit) data word associated with the matching main memory address stored in cache memory is accessed by the CPU for processing. In the event no match occurs, i.e. a miss results from the comparison, main memory must be accessed and the address/data pair from main memory is loaded into the associative cache memory using an appropriate replacement algorithm.
Referring to FIG. 4, set-associative mapping for cache memory will now be described. Set-associative mapping permits the storage of two or more words in cache memory at the same index (cache) address. Thus, each cache memory word stores two or more words from main memory at the same cache address. The distinct data words stored in cache at the same index (cache) address are each associated with a tag. The number of tag/word pairs stored in one word of cache forms a "set". A set of size two is illustrated in FIG. 4 because two data words and their associated tags, i.e. two tag/word pairs, are stored at each index (cache) address.
In FIG. 4, each 6-bit tag field is represented as two octal digits, and each 12-bit data word is represented as four octal digits. Since there are two 18-bit tag/data pairs, i.e. a set size of two, the example in FIG. 4 has a 36-bit cache memory word.
It is possible to have multiple cache memory words located at one index (cache) address. A cache memory which has multiple cache words at one cache address is referred to as a multi-way cache. Thus, if there were two cache words at each cache address, the cache would be a 2-way associative cache memory.
The example in FIG. 4 provides a 9-bit cache address which addresses 2.sup.9 =512 cache words. Thus, the cache memory is 512.times.36 in size. The cache can store 1024 words from main memory since each cache word contains two main memory data words. Generally, a set-associative cache having set size k will accommodate k words of main memory in each word of cache.
The set associative cache memory represented in FIG. 4 will now be described with reference to actual values. The words stored at main memory addresses 01000 and 02000 are stored in cache memory at index address 000. Similarly, the words stored in main memory at main memory addresses 02777 and 00777 are stored in cache memory at cache address 777. Thus, the least significant nine bits, i.e. the three least significant octal digits, are the index into cache memory. The next six higher significant bits, represented as the next two higher significant octal digits, is the tag associated with each data word which is stored in cache memory. The CPU processes a memory reference by using the index field of the main memory address as the cache memory address. The tag field of the main memory address is then compared against each tag associated with each data word stored at the particular cache address in associative cache memory. If the comparison results in a match, i.e. a hit, that data word is used by the CPU. In the event no match occurs, i.e. a miss occurs, the main memory address must be used to access main memory and the accessed data word from main memory is loaded into cache using an appropriate replacement algorithm.
Referring to FIG. 5, a 4-way set associative cache is illustrated because there are four cache words, i.e. four sets, having the same cache address. Each set has a set size of four lines, and each cache memory line stores four 32-bit main memory words. A further example is shown in FIG. 6 wherein a 2-way set associative cache is illustrated with each set having a size of eight. Each "data cell" in the 4-way set associative cache illustrated in FIG. 5 and in the 2-way set associative cache illustrated in FIG. 6 has a tri-state driver ("TSD"). The tri-state drivers maintain the outputs of the data-cells in a high impedance state which permits a direct wire connection from many outputs, i.e. data cells, to a common bus line with only one output, i.e. data cell, having access to the common bus at any given time. The operation of tri-state drivers is generally known to those having skill in the art.
Different configurations of cache memory are used for different applications, including direct, 2-way and 4-way mapping, in order to increase performance for the particular application. For example, differences which exist between data and instruction memory access patterns permit smaller, partitioned (i.e. instructions and data) caches to achieve higher hit ratios. Also, 2-way associative cache is often adequate for instruction caches; however, 4-way associative cache often provides better performance for a data cache.
Although certain configurations provide better performance than other configurations depending on the type of processing, e.g. instruction or data, the type of configuration necessary to obtain the best performance is dependent upon the application code being processed. Therefore, there is a need for a reconfigurable cache which reconfigures the size/type/way of the cache.
Many techniques have been used for reconfiguring cache memories. For example, U.S. Pat. No. 4,853,846 to Johnson et al. discloses a bus expander with logic for virtualizing single cache control into dual channels with separate directories and prefetch for different processors. Johnson et al. provides programmability for the number of sets in cache memory, distinct from the number of ways, and allows for the modification of cache directory operation based upon configuration. The operation of the cache directory is changed such that it splits the cache into two caches which serve two processors rather than one. Johnson et al. also permits multiple simultaneous comparisons and adjusts the number of sets based upon a decoding of address bits. Johnson permits adjustment of the cache, set size and number of sets from one to sixty-three. Although Johnson et al. discloses reconfiguration of the set size and number of sets of a cache memory, the cache of Johnson et al. effectively remains 4-way set associative.
Another technique for configuring cache memories is disclosed in European Patent Application 325420 to Baror. Baror discloses a multi-configurable cache, the configuration being dependent upon setting of cache option bits. The multi-configurable cache can be used as either data or instruction cache and can be organized as 2-way set associative or direct mapped cache memory. The two different configurations provide for location of the associated memory array within a programmable cache unit in a first configuration and location of the associated memory array outside of the programmable cache unit in a second configuration.
Other approaches to cache reconfiguration are oriented towards memory interleaving and/or set size/line length variation. For example, U.S. Pat. No. 4,788,656 to Sternberger discloses a cache memory and pre-processor. Sternberger provides reconfiguration of line size including the ability to dynamically change the line size from 16-bit to 8-bit, thereby allowing the cache to be either 2K or 4K in size. This reconfiguration of line size accommodates processors/memories that have an 8 bit wide path. Sternberger creates the memory interleaving necessary to switch from 8-bit to 16-bit lines by distinguishing between the acquisition and retrieval mode. When the modes are changed, address lines A1 through A11 become connected to RAM address lines A0 through A10 and address line A0 is connected to the chip enable line. This permits reconfiguration from a 2K byte by 16-bit space during the acquisition mode, to a 4K byte by 8-bit space during the retrieval mode, wherein the high and low bytes of the previously defined 16-bit words are interleaved.
U.S. Pat. Nos. 4,430,712 and 4,503,501 to Coulson et al. disclose adaptive domain partitioning of cache memory space to vary cache set/line size. A direct map cache with a variable set size/line length is provided to match line size with the file record size. This effectively provides a variable length line direct map cache.
A further technique for obtaining multi-configurable cache memory through interleaving memory and varying line size is disclosed in U.S. Pat. No. 4,195,342 to Joyce et al. The multi-configurable cache system of Joyce et al. discloses a multi-configurable cache store control unit for varying the line size of a cache to retrieve one or two items from memory based upon configuration. This method also permits interleaving of memory to produce a 2-way set associative cache.
A memory interleaving technique is disclosed in U.S. Pat. No. 4,736,293 to Patrick for interleaving two memory parts to produce a 2-way interleaved set associative memory. Patrick does not address reconfigurability of set associative cache memory but rather provides a mechanism for retaining the efficiency of fine interleaving by partitioning the sets using a fine interleaving rather than the traditional block approach, where a block of contiguous memory equal to the set size is assigned to a set.
Finally, a technique for providing cache reconfiguration is disclosed in IBM Technical Disclosure Bulletin, Volume 23, No. 9, February 1981 by Hrustich and Sitler. Hrustich and Sitler do not dynamically vary the associativity of cache memory, but rather provide a divisible cache using address lines to recover from hardware failures. This is accomplished by using only a portion of the total cache, and certain address lines (A8, A9) are held in a steady state. In the event a hardware failure is detected in a portion of the cache, these address lines are modified to permit use of a previously unused portion of cache. This type of reconfiguration effectively divides the cache into virtual pieces and only allows one piece to be used. In the event a selected portion proves to be nonfunctional, a different portion is used. The cache configuration disclosed by Hrustich and Sitler provides a permanent 8-way associative cache memory.
The above survey indicates that to the best of Applicant's knowledge, the art has not yet determined how to partition set associative cache memory to permit multi-way reconfigurability of cache to change associativity. Moreover, to the best of Applicant's knowledge, the art has not suggested a method or apparatus for efficiently reconfiguring the cache memory.