Most microprocessors make use of virtual or demand-paged memory schemes, where sections of a program's execution environment are mapped into physical memory as needed. Virtual memory schemes allow the use of physical memory much smaller in size than the linear address space of the microprocessor, and also provide a mechanism for memory protection so that multiple tasks (programs) sharing the same physical memory do not adversely interfere with each other.
A virtual or demand-paged memory system may be illustrated as a mapping between a linear (virtual) address space and a physical address space, as shown in FIG. 1. The linear address space is the set of all linear (virtual) addresses generated by a microprocessor. The physical address space is the set of all physical addresses, where a physical address is the address provided on a memory bus to write to or read from a physical memory location. For a 32 bit machine, the linear and physical address spaces are 2.sup.32.about.4 GBytes in size.
In a virtual memory system, the linear and physical address spaces are divided into blocks of contiguous addresses, so that linear and physical addresses belong to at most one block. These blocks are customarily referred to as pages if they are of constant size or are any of several fixed sizes, whereas variable sized blocks are customarily referred to as segments. The linear address space may divided into both segments and pages. A typical page size may be 4 KBytes, for example.
The mapping shown in FIG. 1 illustrates a two-level hierarchical mapping comprising directory tables and page tables. Page directory tables and page tables are stored in physical memory, and are usually themselves equal in size to a page. A page directory table entry (PDE) points to a page table in physical memory, and a page table entry (PTE) points to a page in physical memory. For the two-level hierarchical mapping of FIG. 1, a linear address comprises directory field 102, table field 104, and offset field 106. A directory field is an offset to an PDE, a table field is an offset to an PTE, and an offset field is an offset to a memory location in a page.
In FIG. 1, page directory base register (PDBR) 108 points to the base address of page directory 110, and the value stored in directory field 102 is added to the value stored in PDBR 108 to provide the physical address of PDE 112 in page directory 110. PDE 112 in turn points to the base address of page table 114, which is added to the value stored in table field 104 to point to PTE 116 in page table 114. PTE 116 points to the base address of page 118, and this page base address is added to the value stored in offset 106 to provide physical address 120. Linear address 122 is thereby mapped to physical address 120.
Accessing entries stored in page directories and page tables require memory bus transactions, which can be costly in terms of processor cycle time. However, because of the principle of locality, the number of memory bus transactions may be reduced by storing recent mappings between linear and physical addresses in a cache, called a translation look-aside buffer (TLB). There may be separate TLBs for instruction addresses and data addresses.
Shown in FIG. 2 is an example of an TLB with an associated data or instruction cache comprising way 202 and directory 204. (For simplicity, only one way and one directory of the cache is shown in FIG. 2, but the cache may have m ways and directories so as to be m-way set associative.) The entries (lines) in a way contain data or instructions retrieved from another higher level of the memory hierarchy (not shown). Associated with each entry in way 202 is an entry in directory 204.
In describing the indexing of the TLB, the information content of the entries in the ways and directories of a cache unit, and how these indices and entries relate to linear and physical addresses, it is convenient to introduce the following notation. We denote an arbitrary linear address by A.sub.L and an arbitrary physical address by A.sub.p. If a linear address A.sub.L maps into a physical address A.sub.p, we write A.sub.L.rarw..fwdarw.A.sub.p (this mapping is one-to-one). When convenient, other capital letters will be used to denote other addresses (or portions thereof), e.g., B.sub.p for a physical address, etc. The highest-order n bits of any tuple A (which may be an address) will be denoted by [A].sub.n.
Entries in an TLB and entries in a cache directory are indexed (or pointed to) by various subsets of a linear address. To describe this in more detail, it is useful to partition A.sub.L as A.sub.L =[A".sub.L A'.sub.L ] where A".sub.L points to a unique entry in the TLB and A'.sub.L points to a unique entry in a cache directory. Provided there is an TLB hit, the TLB provides a translation of A".sub.L to the physical address space, and the cache directory entry pointed to by A'.sub.L provides the physical address of its associated cache way entry. If the cache way entry is valid, and if the physical address translation provided by the TLB matches the physical address provided by the cache directory entry, then there is a cache hit and the desired object is retrieved from the cache way. If the comparison between the physical addresses fails, then there is a cache miss and another part of the memory hierarchy (not shown) may need to be accessed. If there is an TLB miss, then the memory hierarchy is accessed to provide the proper page directory and page table entries.
The above process can be described in more detail as follows. Depending upon how A.sub.L is partitioned, not all of the bits in A".sub.L are needed to point to an entry in the TLB. For example, A.sub.L may be partitioned so that part of A".sub.L includes a portion of the offset field. No translation is required for the offset field, and therefore that portion of A".sub.L containing a portion of the offset field does not need translation by the TLB. Consequently, there may be n highest-order bits of A.sub.L, denoted as [A.sub.L ].sub.n, that are used to point to entries in the TLB where n is less than the number of bits in A".sub.L. (Note that in this case [A.sub.L ].sub.n, =[A".sub.L ].sub.n.)
If there is an TLB hit (i.e., a tag matches [A.sub.L ].sub.n, and the entry associated with the tag is valid), then the TLB provides the physical translation of [A.sub.L ].sub.n, which when appended (concatenated) with those bits of A".sub.L not in [A.sub.L ].sub.n, (if any) provides the physical translation of A".sub.L. Denoting the physical translation of A".sub.L as A".sub.p, we have A.sub.L.rarw..fwdarw.[A".sub.p A'.sub.L ].
For the particular example in FIG. 2, A".sub.L is the concatenation of page directory field 102 and page table field 104, so that entries in TLB 214 are pointed to by values in the page directory and page table fields of a linear address. For FIG. 2, the bits stored in offset field 106 point to a unique entry in directory 204 and way 206. That is, A'.sub.L would be identified with offset field 106. The result of an TLB hit would then be the "upper portion" of the physical address mapped by the linear address, i.e., A".sub.p, and the "lower portion" of the physical address is simply the value stored in offset field 106, i.e., A'.sub.L.
A cache hit can now be summarized as follows. For some linear address A.sub.L =[A".sub.L A'.sub.L ], the tags in the TLB are compared with [A.sub.L ].sub.n =[A.sub.L".sub.L ].sub.n. If there is a hit, and if the entry associated with the matched tag is valid, then the TLB entry provides the physical translation of [A.sub.L ].sub.n, which when appended to those bits of A".sub.L not in [A".sub.L ].sub.n provides A".sub.p, where A.sub.L.rarw..fwdarw.[A".sub.p A'.sub.L ]. Tags in the cache directories are compared with A'.sub.L. If there is a hit for a tag, and the entry associated with the tag is valid, then the entry in the cache directory provides B".sub.p where B.sub.p =[B".sub.p A'.sub.L ] is the physical address of the object stored in the corresponding cache way entry. (Entries in the directories alsoe comprise other information concerning lines in the ways, e.g., whether the line is dirty, valid, shared with other caches, etc.) If B".sub.p matches A".sub.p, then A.sub.L.rarw..fwdarw.B.sub.p and there is a cache hit. If B".sub.p fails to match A".sub.p, then there is a cache miss.
The structure of an TLB is illustrated in FIG. 3, comprising CAM (Content Addressable Memory) 302 and RAM (Random Access Memory) 304. A portion of a linear address (more precisely [A".sub.L ].sub.n) is provided to CAM 302, and a hit provides a signal on one of word lines 306 so that RAM 304 provides the result [A".sub.p ].sub.n.
FIG. 4 illustrates part of CAM 302. For simplicity, only the first three TLB tags are shown stored in registers 402, 404, and 406. A portion of linear address 408, [A".sub.L ].sub.n, (e.g., the page directory and page table fields for the two-level hierarchical mapping scheme of FIGS. 1 and 2) is compared with each tag stored in the CAM, and if there is a hit, one of the word lines is brought HIGH.
Often, the linear address of an object in memory is expressed as the sum of two operands. For example, if a branch instruction provides a relative target linear address, then the target linear address is the sum of the relative target linear address and the instruction pointer. If this branch instruction is predicted as taken, then the target instruction having the target linear address is fetched from the instruction cache (or another level of memory in the memory hierarchy if there is an instruction cache miss). Such examples are not only limited to instructions.
Computing the sum of two operands to obtain a linear address before accessing a translation look-aside buffer adds to the overall latency in providing the physical address to which the linear address is mapped to. The present invention addresses this problem.