A cache memory access system of a digital computer typically comprises a translation buffer, a cache directory and a cache memory. The translation buffer contains a list of all the physical (also called real) addresses of data contained within the memory, matched to their logical (also called virtual) addresses. When the computer requires data from the cache, it sends the cache translation buffer the logical address, the translation buffer then compares the logical address to the list of logical addresses in the directory. When the translation buffer finds a match it sends the physical address which corresponds to the logical address, to the directory and the cache memory. The directory contains a list of all the lines of data bytes (for example, 2K lines grouped in 128 bytes per line) that are currently in the cache memory and their associated real addresses. The directory receives the real address and determines whether the specific data required by the instruction is currently in the cache. If the data is not in the cache then the cache access system does not directly access the cache the requested data is retrieved through a separate mechanism which is outside the scope of this invention. If the data is in the cache, the cache memory retrieves the data from the real address and sends the data to the appropriate part of the computer system which requires the data.
In order for the computer system to send the logical address to the cache access system, it must first generate the logical address and determine when the generated address is valid. A logical address is generated when an instruction from the computer system is decoded and, in turn, requires data and a logical address generation. Logical address generation is accomplished by the cache access system taking part of the decoded instruction which has an address of the required data and generating a corresponding logical address. The timing of the validity of the address, and therefore, the speed of the cache memory access system, is determined by a sequence of computer system machine cycles known as a pipeline structure. These cycles segment the operation of the digital computer so that common steps to a variety of instructions can be completed quickly without different parts of the digital computer being idle. A typical pipeline structure for a cache memory access system uses an Instruction decode cycle, followed by an Address generation cycle, followed by a Cache access cycle, and followed by an Execution cycle. These cycles serve to segment the operations of the cache memory access system. FIG. 2 illustrates an example of a general pipeline structure (and summarizes the major actions performed for each stage) which is used in this invention in particular.
The speed of the cache access system can be increased in two ways. First, the address generation cycle can be shortened by speeding up the generation of the logical address. The address generation is typically accomplished by an adder and this has generally meant implementing faster and faster adders. However, this is a hardware solution that is the same for all methods of accessing the cache, and can be implemented on any system. Even though the adder is faster, there is always a delay associated with the adder that increases as address requirements increase. Second, the cache access machine cycle can be shortened by reducing the number of comparisons the directory must accomplish before it finds the address of the specific byte of data and determines whether that byte is in cache at all. Generally this has required larger and more complex logic circuits each time the cache size of the memory expands. The larger the logic, the more complicated the design, and the longer it takes to design the system.
Prior art attempts to increase the speed of the cache memory have relied on the fact that the cache memory access system can be operated based on the concept of set associativity. That is, a cache memory and directory is divided into sets or congruence classes. For example, a 256K byte cache memory could be divided up into 512 congruence classes, each containing 4 lines of 128 bytes. The cache memory is then accessed by addressing all four lines of one congruence classes at once, and subsequently determining which line, selected from the congruence class, is the correct one. This method is faster than the previous method of searching through all the addresses of the cache directory, i.e., one large set, because the number of comparisons required is smaller. Instead of comparing 2K lines with 128 bytes per line, the directory only compares 512 lines.
A problem with the above method is that it depends on the logical address generated by the cache access system and sent to the translation buffer for accessing the memory. This is a problem because two logical addresses can be assigned to the same physical address. This is called the Synonym problem. Therefore, the set associativity method builds in a potential error (accessing different data than that required by the instruction) because it uses logical addresses. The potential error must be checked. When the check does find an error, the solution is to re-access the memory with the correct address. Even if the selection is correct (i.e., is a "hit" as opposed to an incorrect "miss") enough times, and the time wasted is averaged out over many instructions, the problem still exists because of the nature of the logical address generation. In addition, the logic to check for this problem becomes larger and takes more time to accomplish its function as the cache memory gets larger. Increasing the hit/miss ratio associated with the directory and logical address, increases the speed of the cache memory access system, but does not help to simplify the design problems associated with the cache access system.
A prior art method of accessing a cache memory that illustrates the problem is when the choice of the line within a congruence class is based on the last line used from that congruence class. This choice is merely a guess (although a good one) that the last line used from the congruence class is the most likely line to have the data byte in it again. This guess is correct a significant fraction of the time, however, the guess is still made based on the logical (or virtual) address from the address generation part of the cache access system. This means that the technique does not get rid of the error associated with the fact that two logical addresses can be associated with the same physical address (Synonym problem). The checking logic must still expand every time the cache memory gets larger because of this error. Other prior art methods of solving the synonym problem have concentrated on flagging a potential address and storing the address when a potential synonym occurs. Then the system would just retrieve the physical data one time. This technique does not solve the problem but merely avoids it, and still requires logic that must expand every time the cache memory expands.