In a processor that has a cache system in which the directory and the cache arrays are referenced by means of a real (absolute) address and in which the combination of the cache size, line size, and number of sets in the directory require the number of congruence classes to require address bits to access the directory and and cache that are subject to translation there exists the synonym problem. This occurs because at the time the access of the cache information takes place the upper bits of the real (absolute) are not yet available from translation. For performance considerations of the processor it is not acceptable to wait until the results of the translation, even with a translation lookaside buffer (TLB), are complete to start the cache data access.
IBM's ESA/390 system at the G4 (Generation 4) level was commercially available in 1997. This processor has 64K of cache, with a line size of 128 bytes and a 4-way set associative directory. This results in the cache holding 512 lines. Being a 4-way set associative directory required 128 congruence classes in the cache. The S/390 architecture has a 4K page size. With a 128 byte line size the least significant 7 bits are used to index into the line. With having 128 congruence classes 7 bits are required to access the cache directory and data arrays. With the 4K page size only the least significant 12 bits do not require translation. However with a total of 14 bits required to access the data the 2 most significant bits of the address to access the data require translation, giving a total of four possible congruence classes to access, each with 4 line entries. Thus the comparison to determine if the line was in the cache must occur 16 times. As the number of compares increases the effect on cycle time is negative. Thus a strong desire to keep the number of compares to a minimum. If the cache were 256K with 256 byte lines with the same 4-way set associative directory there would be 1K lines with 256 congruence classes. With 8 bits being required for the index in the line and 8 bits to access the congruence class there are now 4 bits of the address that are subject to translation. This gives 16 possible congruence classes that the line may be, which yields 64 comparisons. This is a problem in a processor design with high frequency operation goals.
A proposed solution to this problem is to predict the correct values of the address bits that require translation. This has been purposed in a number of references and processor designs. In U.S. Pat. No. 5,148,538 "Translation Look Ahead Based Cache Access", IBM TDB 8-82 "Mechanism for Acceleration of Cache References", and IBM TDB 1-89 "Effecting a One-cycle Cache Access in a Pipeline Having Combined D/A Using a BLAT", and in the G4 processor design as described in IBM Journal of Research and Development, vol 41, no 4/5 "A High-frequency Custom CMOS S/390 Microprocessor".
In "Mechanism for Acceleration of Cache References" a scheme is used that uses the base register number and the results of the value of an addition of bits of the displacement field and bits of the base register to predict the line number that is desired to be referenced. It also indicates that if the base register number is zero then use the index register number and bits. This scheme does not deal with cases in which both the base and index registers are used to calculate the operand address. In "Effecting a One-cycle Cache Access in a Pipeline Having Combined D/A Using a BLAT" a 16 entry table is used to convert a base register number to bits of the real (absolute) address that is the value of the address the last time that base register number was used to reference storage. It does not deal with use an index register or possible effects of large displacements. In "Translation Look Ahead Based Cache Access" demonstrates a method that uses a range of bits from the base register to index a table to provide a guess based on prior references what the real (absolute) address bits will be for the line referenced by that value in the base register. Again it does not deal with possible effects of index or displacement values. In the S/390 Generation 4 processor as described in "A High-frequency Custom CMOS S/390 Microprocessor" there was a new structure called the Absolute Address History Table (AAHT) was introduced. It used bits 12 to 19 of the selected base register when the base register number was non-zero or bits 12 to 19 of the selected index register when the base register number was zero to index a 256 entry table that related these partial register addresses to two bits of real (absolute) addresses. With predicting only two bits even when the table lookup was for the wrong entry there was still a 25% chance of getting the correct guess. These bits were then used to access the data directory and arrays. This was done for the operand data accesses. For instruction accesses there were four instruction request addresses. The cache would remember the real (absolute) address bits for each one. When new requests were made an indication would be given as to which of the four values should be used. Instruction requests are generally the same line access as the current fetching stream thus this will generally provide correct results.