The present invention relates generally to a method and apparatus for reading data from a translation lookaside buffer (TLB). More specifically, the present invention allows two consecutive TLB entries to be accessed in parallel.
Many modern computing systems operate on large uniform virtual address spaces that greatly exceed the amount of physical memory actually present in any given machine configuration. For example, 32-bit byte-addressed CPU's generally have a uniform virtual address space of 2.sup.32 bytes or 4 gigabytes per process. However, the amount of physical memory supported by such machines typically ranges anywhere from 1 to 1024 megabytes shared by all processes. Consequently, each memory access requires that the virtual address supplied by the CPU be translated (mapped) into a physical (or real) address that references an actual location in memory. Since the translation process is both relatively lengthy (it consumes many CPU cycles), and since the same address will often be used many times, it is common practice to store mapped pairs of virtual and real addresses in a special cache memory called a translation memory or a translation lookaside buffer (TLB). Virtual addresses supplied by the CPU are checked against the TLB to see if a virtual/real address translation is already stored in the TLB for a given virtual address. If so, then the translation information is obtained directly from the TLB, and the usual translation process is avoided.
FIG. 1 shows a known apparatus 10 for translating 32-bit virtual addresses to 32-bit physical byte addresses. Apparatus 10 includes a register file 14 which, in the apparatus shown, is a 32-bit wide register file. One or more registers 16 within register file 14 (referred to in a memory reference instruction) may store a base address value used in a virtual address calculation. The base address is communicated to an adder 18 over a communication path 22. Adder 18 also receives a displacement address (from the memory reference instruction) over a communication path 26. Adder 18 adds the displacement address to the base address to produce the 32-bit virtual address on a communication path 30.
The 32-bit virtual address output on communication path 30 conceptually may be split into three parts (box 32). First, assume the computing system's physical memory is divided into fixed length pages of 2.sup.12 or 4 kilobytes (KB) each. Thus, for a 4 KB page, 12 bits are needed to address a specific byte in a page. The least significant bits of the virtual address (bits 11:0! in this example) appearing on a communication path 31 may constitute the page displacement portion of the address. These bits need no translation, as they are the same in both the virtual and the real address (VA=RA). Consequently, they may be ignored during the translation process. The middle bits of the virtual address appearing on a communication path 38 (termed the virtual page address) are used to select an entry 40 in TLB 34. The number of bits comprising the virtual page address is a function of the number of entries in TLB 34. For a 64 entry TLB, 6 bits are needed to select one of the entries. Thus, in this case, bits 17:12! are used to address TLB 34. Of course, if TLB 34 contained 128 entries, then TLB 34 would be addressed by 7 bits, and so on. The remaining high order bits (bits 31:18!, termed the virtual segment address) are used in the address translation process in the manner discussed below.
Each TLB entry 40 includes a virtual address tag field 42, a real address field 46, and a control field 50. The virtual address tag field 42 typically comprises bits 31:18! of the virtual address corresponding to real address bits 31:12! stored in real address field 46. Control field 50 typically includes access control bits, valid bits, used bits, etc. When TLB 34 is addressed by bits 17:12! of the virtual address on communication path 38, the addressed virtual address tag is communicated to a comparator 54 over a communication path 58. At the same time, bits 31:18! of the virtual address are communicated to comparator 54 over a communication path 62. If the bits match, then a TLB hit signal is provided on a communication path 66. On the other hand, if the bits do not match, then comparator 54 generates a miss signal on communication path 66. If a hit signal is generated on communication path 66, then the addressed entry in TLB 34 contains the address translation information for the requested virtual address, and the real address bits 31:12! in real address field 46 are output on a communication path 70 and concatenated with the VA=RA low order bits of the virtual address (i.e., bits 11:0!) by a real address circuit 71 to form the 32 bit real address RA 31:0! on a communication path 78. The real address then may be used to access the memory.
If a miss signal is generated by comparator 54 on communication path 66, then the virtual address is communicated to a dynamic translation unit (DTU) 82 over a communication path 86 to begin the much slower process of translating the virtual address by accessing page tables stored in main memory. When this "dynamic" translation is completed, TLB 34 will be updated with the newly translated virtual/physical address pair (displacing one of the current entries, if necessary), for a quick reference via TLB lookup should it be used again.
While a TLB lookup (unlike the many-cycle translation process itself) provides a relatively quick way to get a particular virtual-to-physical address mapping, nevertheless, as processor clock speeds increase past 100 MHz, the time needed to access the TLB itself becomes part of the critical path in the machine's operation. Since the TLB is on the critical path for all memory accesses (supplying both source addresses for data or instructions to be loaded from, and destination addresses for data to be stored at), the rate at which the TLB runs ultimately affects the rate at which the entire machine can run.
From inspection of FIG. 1, it should be apparent that a major slowdown in accessing TLB 34 is the 32-bit add that must be performed by adder 18 on the register plus displacement values contained in the memory reference instruction to generate the full 32-bit virtual address. Even using advanced CMOS circuitry, performing a 32-bit add takes considerable time, and access to TLB 34 cannot even begin until the addition is completed. Thus, improvements in the mechanisms for obtaining data from TLB 34 are highly desirable.
One solution to this problem is described in commonly assigned, copendinq U.S. patent application Ser. No. 08/148,219, now U.S. Pat. No. 5,502,829, filed on Nov. 3, 1993, for APPARATUS FOR OBTAINING DATA FROM A TRANSLATION MEMORY, the entire specification and claims of which are incorporated herein by reference. FIG. 2 is a block diagram of an apparatus 100 according to that application for obtaining data from a translation memory. Some of the components used in apparatus 10 of FIG. 1 are also used in apparatus 100, and their numbering remains the same.
Assume apparatus 100 operates in a computing system which organizes data in 4 KB pages and that TLB 34 contains 64 entries much like apparatus 10 of FIG. 1. In apparatus 100, the displacement address is limited to be no larger than VA=RA page displacement portion of the virtual address (however many bits that may be). Thus, for 4 KB pages, the displacement address is no larger than 12 bits. An adder 110 adds the displacement address received over communication path 26 to the base address received over communication path 22 and provides the 32 bit virtual address on a communication path 30 much like adder 18 of FIG. 1. In addition to the calculated virtual address, adder 110 generates a carry signal on a communication path 114 for indicating whether the addition of the displacement address to the base address resulted in a carry. Unlike apparatus 10 shown in FIG. 1, bits 17:12! of the calculated virtual address are not used to access TLB 34. Instead, bits 17:12! of the base address (termed the base page address) are communicated to TLB 34 over a communication path 118 for directly addressing one of the translation entries 40A therein. Bits 17:12! of the base address are also communicated to an adder 122 which increments the address value by 1 and uses the resulting value to address a second entry 40B within TLB 34. That is, the entry in TLB 34 addressed by bits 17:12! of the base address is accessed along with the next succeeding entry in TLB 34, the access to which is delayed only by the single increment add in adder 122. The virtual address tag 42A and real address tag 46A addressed by the value on communication path 118 together with the virtual address tag 42B and real address tag 46B addressed by the output of adder 122 are communicated to a multiplexer 130 over respective communication paths 131, 132, 133, and 134.
Since the displacement address is no larger than the lower VA=RA page displacement portion of the virtual address, adding the displacement address to the base address at most results in a carry in the bit 12! position. Consequently, the effect of the addition will be at most to increase the value of bits 17:12! of the base address by one. Thus, by accessing TLB 34 with bits 17:12! of the base address and accessing the next succeeding entry in TLB 34 ensures that one of the entries output by TLB 34 corresponds to the entry that would have been requested had TLB 34 been accessed with bits 17:12! of the calculated virtual address. The carry indicating signal on communication path 114 thus may be communicated to multiplexer 130 and used to select the proper translation entry, and the virtual address tag portion of the selected entry is communicated to comparator 54. As in apparatus 10 of FIG. 1, if the selected virtual address tag matches bits 31:18! of the calculated virtual address, then comparator 54 generates a hit signal on communication path 66, and the value in the real address field of the selected entry is concatenated with the VA=RA low order bits of the calculated virtual address by real address circuit 71 to form the 32-bit real address on communication path 78. If the selected virtual address tag does not match, then comparator 54 generates a miss signal, and bits 31:12! of the calculated virtual address are communicated to DTU 82 for translation to a real address.
Since the additional steps required by apparatus 100, i.e., selecting two entries in the TLB and selecting the correct one after the carry on bit 11! is decided, are overlapped with the 32-bit addition of the base plus displacement values, they effectively take no additional time. Since selection of an entry in TLB 34 is completed by the time the add is completed (rather than merely beginning TLB access at that time), the overall result is a significant reduction in the total amount of time occupied by a TLB lookup.
Unfortunately, as processor clock speeds continue to increase, even the simple addition performed by adder 122 can become a bottleneck. Finding the nth+1 TLB entry by adding an offset of 1 to the nth address, while a straightforward and feasible solution to the problem, may present a problem in high speed systems where the addition must propagate through a large number of bits. Therefore, a mechanism is desirable which facilitates access to consecutive TLB entries as closely to simultaneous as possible.