This invention relates to computing systems, and, more particularly, to an apparatus for translating virtual addresses to physical addresses.
Many modern computing systems operate on large uniform virtual address spaces that greatly exceed the amount of physical memory actually present in any given machine configuration. For example, 32-bit byte-addressed CPU's generally have a uniform virtual address space of 2.sup.32 bytes or 4 gigabytes. On the other hand, the amount of physical memory supported by such machines typically ranges anywhere from 1 to 256 megabytes. Consequently, each memory access requires that the virtual address supplied by the CPU be translated (mapped) into a physical (or real) address that references an actual location in memory. Since the translation process is both relatively lengthy (it consumes many CPU cycles), and since the same address will often be used many times, it is common practice to store mapped pairs of virtual and real addresses in a special cache memory called a translation lookaside buffer (TLB). Virtual addresses supplied by the CPU are checked against the TLB to see if a virtual/real address translation is already stored in the TLB for a given virtual address. If so, then the translation information is obtained directly from the TLB, and the usual translation process is avoided.
FIG. 1 shows a known apparatus 10 for translating 32-bit virtual addresses to 32-bit physical byte addresses. Apparatus 10 includes a register file 14 which, in this embodiment, is a 32-bit wide register file. One or more registers 16 within register file 14 (referred to in a memory reference instruction) may store a base address value used in a virtual address calculation. The base address is communicated to an adder 18 over a communication path 22. Adder 18 also receives a displacement address (from the memory reference instruction) over a communication path 26. Adder 18 adds the displacement address to the base address to produce the 32-bit virtual address on a communication path 30.
The 32-bit virtual address output on communication path 30 conceptually may be split into three parts. First, assume the computing system's physical memory is divided into fixed length pages of 2.sup.12 or 4 kilobytes (KB) each. Thus, for a 4 KB page, 12 bits are needed to address a specific byte in a page. The least significant bits of the virtual address (bits 11:0! in this example) appearing on a communication path 31 may constitute the page displacement portion of the address. These bits need no translation, as they are the same in both the virtual and the real address (VA=RA). Consequently, they may be ignored during the translation process. The middle bits of the virtual address appearing on a communication path 38 (termed the virtual page address) are used to select an entry 40 in TLB 34. The number of bits comprising the virtual page address is a function of the number of entries in TLB 34. For a 64 entry TLB, 6 bits are needed to select one of the entries. Thus, in this case, bits 17:12! are used to address TLB 34. Of course, if TLB 34 contained 128 entries, then TLB 34 would be addressed by 7 bits, and so on. The remaining high order bits (bits 31:18!, termed the virtual segment address) are used in the address translation process in the manner discussed below.
Each TLB entry 40 includes a virtual address tag field 42, a real address field 46, and a control field 50. The virtual address tag field 42 typically comprises bits 31:18! of the virtual address corresponding to real address bits 31:12! stored in real address field 46. Control field 50 typically includes access control bits, valid bits, used bits, etc. When TLB 34 is addressed by bits 17:12! of the virtual address on communication path 38, the addressed virtual address tag is communicated to a comparator 54 over a communication path 58. At the same time, bits 31:18! of the virtual address are communicated to comparator 54 over a communication path 62. If the bits match, then a TLB hit signal is provided on a communication path 66. On the other hand, if the bits do not match, then comparator 54 generates a miss signal on communication path 66. If a hit signal is generated on communication path 66, then the addressed entry in TLB 34 contains the address translation information for the requested virtual address, and the real address bits 31:12! in real address field 46 are output on a communication path 70 and concatenated with the VA=RA low order bits of the virtual address (i.e., bits 11:0!) by a real address circuit 71 to form the 32 bit real address RA 31:0! on a communication path 78. The real address then may be used to access the memory.
If a miss signal is generated by comparator 54 on communication path 66, then the virtual address is communicated to a dynamic translation unit (DTU) 82 over a communication path 86 to begin the much slower process of translating the virtual address by accessing page tables stored in main memory. When this "dynamic" translation is completed, TLB 34 will be updated with the newly translated virtual/physical address pair (displacing one of the current entries, if necessary), for a quick reference via TLB lookup should it be used again.
While a TLB lookup (unlike the many-cycle translation process itself) provides a relatively quick way to get a particular virtual-to-physical address mapping, nevertheless, as processor clock speeds increase past 100 MHz, the time needed to access the TLB itself becomes part of the critical path in the machine's operation. Since the TLB is on the critical path for all memory accesses (supplying both source addresses for data or instructions to be loaded from, and destination addresses for data to be stored at), the rate at which the TLB runs ultimately affects the rate at which the entire machine can run.
From inspection of FIG. 1, it should be apparent that a major slowdown in accessing TLB 34 is the 32-bit add that must be performed by adder 18 on the register plus displacement values contained in the memory reference instruction to generate the full 32-bit virtual address. Even using advanced CMOS circuitry, performing a 32-bit add takes considerable time, and access to TLB 34 cannot even begin until the addition is completed. Thus, improvements in the mechanisms for obtaining data from TLB 34 is highly desirable.