In the continuing development of faster and more powerful computer systems, microprocessors have been utilized, known as complex instruction set computer (CISC) processors and reduced instruction set computers (RISCs). Increased advances in the field of RISC and CISC processors have led to the development of superscalar processors. Superscalar processors, as their name implies, perform functions not commonly found in traditional scalar microprocessors. Included in these functions is the ability to execute instructions out-of-order with respect to the program order. Although the instructions occur out-of-order, the results of the executions appear to have occurred in program order, so that proper data coherency is maintained.
Although the background will be discussed in the context of a superscalar design, many of these features can apply to other high speed processing systems.
In performing instructions, data address generation typically occurs as shown by the block diagram of FIG. 1a. A base value 8 in a base register, for example, register A, is added to an offset value 9, where the offset value 9 is immediate data or a value stored in a register B, to produce an effective address, EA. The base value 8 in register A normally points to some location in a page of memory with the offset value 9 adding a particular adjustment to that location. The EA is then translated via a translation mechanism 11 to produce the physical address, PA. The PA is then used to locate the data in main memory or a data cache 16. A main goal of producing faster system operations is to reduce the time required to generate the PA.
Typically, the translation mechanism controls the speed of the address translation and is a table lookup mechanism, commonly referred to as a translation lookaside buffer, TLB, that maps the EA to PA. Three basic organizations are usually used in a TLB and are known as direct mapping, `n`-way set associative mapping, and fully associative mapping.
FIGS. 1b-d are conceptual illustrations of exemplary mapping techniques for caching information, according to the prior art. A cache directory 10 has p=2.sup.K =8 entries, each associated with a respective block of information within a cache memory. A matrix 12 has q=2.sup.m+k =64 octal addresses, each representing a respective block of information within a secondary memory. FIGS. 1b-d show each block of matrix 12 together with the block's respectively associated address. Each of the q octal addresses of matrix 12 has m+k=6 address bits.
FIG. 1b illustrates a direct mapping technique. In FIG. 1b, matrix 12 and directory 10 are logically arranged into p=2.sup.k =8 congruence classes. A congruence class is specified by an address's low-order k address bits. Accordingly, each congruence class includes multiple addresses, all of which share the same low-order k address bits. For FIG. 1b, k=3 and m=3.
For example, in FIG. 1b, one congruence class includes 10 all addresses whose low-order three address bits are octal 7. This congruence class includes the octal addresses 07, 17, 27, 37, 47, 57, 67 and 77. Likewise, another congruence class includes the octal addresses 02, 12, 22, 32, 42, 52, 62 and 72.
In FIG. 1b, each congruence class has one respective preassigned associated entry within cache directory 10. Accordingly, at any single moment, the cache memory stores information for only a single address of a congruence class; this single address is specified in the congruence class's associated entry of cache directory 10. For example, in the congruence class's associated entry, cache directory 10 can store a tag including the single address's high-order m address bits. For FIG. 1b, m=3.
As an example, in FIG. 1b, from among the eight addresses whose low order three address bits are octal 5, cache directory 10 indicates that the cache memory stores information for only octal address 45 whose tag value is octal 4. Similarly from among the eight addresses whose low order three address bits are octal 1, cache directory 10 indicates that the cache memory stores information for only octal address 31.
Accordingly, the low-order k address bits of an address ADDR specify the congruence class of ADDR. Moreover, the low-order k address bits operate as an index to access the congruence class's associated entry within cache directory 10 and its associated block of information within the cache memory by binary decoding. The indexed entry of cache directory 10 is read and compared with ADDR. If ADDR matches the indexed entry, then the indexed block of the cache memory stores information for ADDR.
A disadvantage of the direct mapping technique is that storage in the cache memory of one address's information excludes the storage of information for all other addresses of the same congruence class. This disadvantage is augmented by the fact that the number (2.sup.k) of entries in cache directory 10, so that a large number of addresses are forced to share a single entry in cache directory 10. Likewise, all addresses of a single congruence class are forced to share a single entry in the cache memory.
FIG. 1c illustrates an n-way set-associative mapping technique, where n=2. In FIG. 1c, matrix 12 and directory 10 are logically arranged into p/n=4 congruence classes. A congruence class is specified by an address's low-order y address bits, where p/n=w.sup.y. Accordingly, each congruence class includes multiple addresses, all of which share the same low-order y address bits. For FIG. 1c, k=3 and y=2.
For example, in FIG. 1c, one congruence class includes all addresses whose low-order two address bits have a value=3. This congruence class includes the octal addresses 03, 07, 13, 17, 23, 27, 33, 37, 43, 47, 53, 37, 63, 67, 73 and 77. Likewise, another congruence class includes the octal addresses 01, 05, 11, 15, 21, 25, 31, 35, 41, 45, 51, 55, 61, 65, 71 and 75.
In FIG. 1c, cache directory 10 is logically arranged into two columns having four blocks each. Thus, each congruence class has a respective preassigned associated set of first and second entries within cache directory 10. Accordingly, at any single moment, the cache memory stores information for first and second addresses of a congruence class; the first and second addresses are specified in the congruence class's associated set of first and second entries within cache directory 10. For example, in the first associated entry, cache directory 10 can store a first tag including the first address's high-order m+1 address bits; in the second associated entry, cache directory 10 can store a second tag including the second address's high-order m+1 address bits. For FIG. 1c, m=3.
As an example, in FIG. 1c, from among the sixteen addresses whose low order two address bits have a value=1, cache directory 10 indicates that the cache memory stores information for only octal address 05 and octal address 11. Similarly, from among the sixteen addresses whose low order two address bits have a value=3, cache directory 10 indicates that the cache memory stores information for only octal address 43 and octal address 13.
Accordingly, the low-order y address bits of address ADDR specify the congruence class of ADDR. Moreover, the low-order y address bits operate as an index to the congruence class's associated set of two entries within cache directory 10 and its associated set of two blocks within the cache memory. The two indexed entries of cache directory 10 are read and compared with ADDR. If ADDR matches one of the indexed entries, then the matching entry's associated block of the cache memory stores information for ADDR. A disadvantage of the set-associative technique is delayed selection of information output from the cache memory, resulting from selection between the two indexed entries of cache directory 10.
FIG. 1d illustrates a fully associative mapping technique. In FIG. 1d, matrix 12 and directory 10 are not logically arranged into congruence classes. Accordingly, at any single moment, the cache memory can store information for any group of eight addresses; these eight addresses are specified in the eight entries of cache directory 10. For example, cache directory 10 can store eight tags, each including all bits of an address.
In FIG. 1d, cache directory 10 is structured as a content addressable memory ("CAM") array of p=8 CAM entries by m+k=6 address bits. As a CAM array, cache directory 10 inputs address ADDR and compares it simultaneously with all addresses in the eight CAM entries. If ADDR matches any CAM entry's address, then a respective one of match lines 14a-h is asserted to directly select the cache memory block storing information for ADDR. Thus, cache directory 10 operates as a decoder and accessing mechanism for the cache memory.
A disadvantage of the fully associative technique is the expense, complexity, and diminished speed of a CAM array having m+k address bit lines by p match lines This is especially true as the number (m+k) of address bits increases in conjunction with the number (q=2.sup.m+k) of secondary memory blocks.
Although the attempts to use set associative methods to improve translation are generally satisfactory, improvements for reducing translation time and increasing performance are desirable. Accordingly, a need exists for a system of producing better translation mechanisms, including predictive data address translation to improve overall processor performance.