Many computer systems utilize cache memories to reduce delays associated with the fetching of items, e.g., instructions, stored in main memory. Memory access time is a factor which frequently limits host processor throughput. Accordingly, cache memories are often used.
FIG. 1 depicts a simplified configuration of a instruction cache memory system 10. An instruction cache 20 is disposed between main memory 24 and an execution unit 12. A number of other execution units 13 may also be coupled to main memory 24. Here, instruction cache 20 serves to expedite memory access times for execution unit 12. After the execution unit 12 has executed an instruction, the program counter 14 points to the next instruction or the predicted next instruction. In some cases, it is necessary to do a full or partial translation of the address output by the program counter, e.g., from a virtual address to a physical address. It may also be necessary to do a look-up of the physical address, based on the virtual address. This may be done by using a paging unit 16 and a translation lookaside buffer (TLB 17) in manners well known in the art. The system provides the address of the next (or predicted next) instruction to the cache control 18. By comparing portions of the instruction with one or more tags, the cache control 18 can determine whether the requested instruction address presently resides in the instruction cache 20. If the necessary instruction is not in the instruction cache 20 (a "miss") the cache control 18 provides controls 22 resulting in transfer of a block of instructions from main memory 24 into the cache 20. Access to the cache is then re-tried.
If the cache 20 contains the needed instruction (a "hit"), one or more instruction words, including the word whose address is in the program counter 14, are output from the cache 20. Typically, a number of words will be output at any one time. Transfers of four words are common. These words may, e.g., be stored in an instruction buffer 26 which can typically hold a number of instructions. Thus, there will normally be a plurality of instructions that the execution unit 12 can rapidly access. This is useful since the execution unit 12 may, during certain periods, (e.g., when the cache 20 is being written-into after a cache miss) be receiving instructions more rapidly then are being output by the cache 20. In some cases, however, instructions that have accumulated in the instruction buffer 26 are invalidated or deleted. For example, this can occur when the execution unit 12 has taken a branch instruction and a new sequence of instructions need to be input to the execution unit 12.
In cases where more than one execution unit is allowed access to main memory and the cache is written-to, the system will normally include some method of assuring that the data in the cache 20 and the memory 24 correspond to one another. This is typically referred to as maintaining "cache coherency". A number of cache coherency techniques are known. Examples are the techniques referred to as "write through" and "write back". In some systems cache coherency is maintained by using a bus watch unit 28. This unit 28 monitors the global address bus 31 to detect, e.g., write operations to memory by other bus masters (not shown). This allows the cache controller 18 to invalidate a cache entry when another device writes to a location in shared memory.
A number of cache to memory mapping techniques are known. One technique is known as "set-associative mapping" where the cache is considered as being divided into two or more sets (otherwise termed "banks", "compartments", or "ways"). FIG. 2 depicts an example two-way set-associative cache 20. In this figure the cache 20 contains a first set or compartment 32 and a second set or compartment 34. Each compartment 32, 34 contains a plurality of slots or lines 36a-36n, 38a-38n. Each line contains a number of words 42a-42k, 44a-44k. For the purposes of discussion, throughout this specification a cache system which retrieves four words per access will be described. Further, the exemplary system described will include eight words per line. Those skilled in the art will recognize that a different number of words can be retrieved per access and that lines of a set associative cache may contain more than eight words.
One of the problems in previous instruction fetch methods involves the situation where a fetch occurs for a word which is located near the end of one the lines 36a-36n, 38a-38n. This can occur relatively frequently, for example, after a branch instruction has been taken. In normal operation, when the program counter 14 points to an instruction which is not near the end of the line, such as word 45b, the cache 20 will output a number of words (e.g., four words 45b, 45c, 45d, 45e) after the tag has first been used to verify that the requested data is the data present on line 36a. When the access is near the beginning of line 36a this is performed easily since it is known that a given slot or line of the cache contains a series of instruction words with consecutive addresses, i.e., it is known that the instruction word having an address next-exceeding the address of word 45b will be word 45c.
However, near the end of the line 36a it is unknown, in current systems, what the next succeeding word after the last word in the line 45k will be. In a typical situation, the succeeding word after 45k will be either the first word in the next line (i.e., 46a) or the first word 48a in a corresponding line 38b in the other set 34 of the cache 20. However, in previous set-associative systems, the information regarding which of these locations held the next succeeding instruction after the last instruction 45k of a line 36a was not available. Therefore, in previous devices when a read access was made for a word near the end of a line, such as 45j, the words at the end of the line such as 45j and 45k could be output to the instruction buffer 26, but the next word after 45k could not be output because its location was not available. Typically this would result in a need to consult, e.g., the TLB 17 or to take other action necessary to discern the location of the next word and to then try a second access of the cache 20. Thus it can be seen that while, for accesses a distance from the end of a line, the average yield per access was an output of a number of words, e.g, four words per access (as above). However, when an access was made near the end of a line, an access could result in the output of fewer than four valid words.
There are several circumstances where this reduced efficiency of cache access near the end of lines is particularly troublesome. The closer the initial access address is to the end of the line the worse the efficiency. For example, if there was an access of word 45k, only a single word would be useful to the instruction buffer 26 during that access of the cache rather than four words. Additionally, systems which are configured to output more than four words per cache access have an even greater relative reduction in efficiency near the end of the line compared to the reduction that occurs away from the end of the line. As will be apparent, if each access of the cache normally results in an output of i words, and a cache line is k words long, the potential for problems near the end of the line will arise whenever the access is between the (k-i+1)th word of the line and the Kth word of the line. Furthermore, the problem is exacerbated if the access near a cache line end occurs at a time when an instruction buffer 26 is empty. This situation frequently occurs after a branch has been taken.
FIG. 3 depicts a cache tag which could be used with the cache system described above. In this embodiment, the tag includes several sections 54a, 54b, 54c, 54d, one section for each word. For a four-word per access cache, for example, the respective tag would contain four sections as depicted. Each section, of which 54a is representative, contains several entries including, e.g., a real or virtual address tag (ADDRESS) which will typically be a portion of the address of the corresponding word in main memory. Several flags are also contained within each tag. As those of skill in the art know, a "dirty" flag (typically one bit which may be used, e.g., for cache coherency purposes) and a valid flag, may be used to indicate whether the data stored at this location of cache is valid data.
Accordingly, a system is needed which reduces the inefficiencies resulting from accesses near the end of a line in a set-associative cache system.