The present invention relates to computing systems and, more particularly, to a method and apparatus for translating virtual addresses in a computing system having multiple instruction pipelines.
FIG. 1 is a block diagram of a typical computing system 10 which employs virtual addressing of data. Computing system 10 includes an instruction issuing unit 14 which communicates instructions to a plurality of (e.g., eight) instruction pipelines 18A-H over a communication path 22. The data referred to by the instructions in a program are stored in a mass storage device 30 which may be, for example, a disk or tape drive. Since mass storage devices operate very slowly (e.g., a million or more clock cycles per access) compared to instruction issuing unit 14 and instruction pipelines 18A-H, data currently being worked on by the program is stored in a main memory 34 which may be a random access memory (RAM) capable of providing data to the program at a much faster rate (e.g., 30 or so clock cycles). Data stored in main memory 34 is transferred to and from mass storage device 30 over a communication path 42. The communication of data between main memory 34 and mass storage device 30 is controlled by a data transfer unit 46 which communicates with main memory 34 over a communication path 50 and with mass storage device 30 over a communication path 54.
Although main memory 34 operates much faster than mass storage device 30, it still does not operate as quickly as instruction issuing unit 14 or instruction pipelines 18A-H. Consequently, computing system 10 includes a high speed cache memory 60 for storing a subset of data from main memory 34, and a very high speed register file 64 for storing a subset of data from cache memory 60. Cache memory 60 communicates with main memory 34 over a communication path 68 and with register file 64 over a communication path 72. Register file 64 communicates with instruction pipelines 18A-H over a communication path 76. Register file 64 operates at approximately the same speed as instruction issuing unit 14 and instruction pipelines 18A-H (e.g., a fraction of a clock cycle), whereas cache memory 60 operates at a speed somewhere between register file 64 and main memory 34 (e.g., approximately two or three clock cycles).
FIGS. 2A-B are block diagrams illustrating the concept of virtual addressing. Assume computing system 10 has 32 bits available to address data. The addressable memory space is then 2.sup.32 bytes, or four gigabytes (4 GB), as shown in FIG. 2A. However, the physical (real) memory available in main memory 34 typically is much less than that, e.g., 1-256 megabytes. Assuming a 16 megabyte (16 MB) real memory, as shown in FIG. 2B, only 24 address bits are needed to address the memory. Thus, multiple virtual addresses inevitably will be translated to the same real address used to address main memory 34. The same is true for cache memory 60, which typically stores only 1-36 kilobytes of data. Register file 64 typically comprises, e.g., 32 32-bit registers, and it stores data from cache memory 60 as needed. The registers are addressed by instruction pipelines 18A-H using a different addressing scheme.
To accommodate the difference between virtual addresses and real addresses and the mapping between them, the physical memory available in computing system 10 is divided into a set of uniform-size blocks, called pages. If a page contains 2.sup.12 or 4 kilobytes (4 KB), then the full 32-bit address space contains 2.sup.20 or 1 million (1M) pages (4 KB.times.1M=4 GB). 0f course, if main memory 34 has 16 megabytes of memory, only 2.sup.12 or 4K of the 1 million potential pages actually could be in memory at the same time (4K.times.4 KB=16 MB).
Computing system 10 keeps track of which pages of data from the 4 GB address space currently reside in main memory 34 (and exactly where each page of data is physically located in main memory 34) by means of a set of page tables 100 (FIG. 3) typically stored in main memory 34. Assume computing system 10 specifies 4 KB pages and each page table 100 contains 1K entries for providing the location of 1K separate pages. Thus, each page table maps 4 MB of memory (1K.times.4KB=4 MB), and 4 page tables suffice for a machine with 16 megabytes of physical main memory (16 MB/4 MB=4).
The set of potential page tables are tracked by a page directory 104 which may contain, for example, 1K entries (not all of which need to be used). The starting location of this directory (its origin) is stored in a page directory origin (PDO) register 108.
To locate a page in main memory 34, the input virtual address is conceptually split into a 12-bit displacement address (VA&lt;11:0&gt;), a 10-bit page table address (VA&lt;21:12&gt;) for accessing page table 100, and a 10-bit directory address (&lt;VA 31:22&gt;) for accessing page directory 104. The address stored in PDO register 108 is added to the directory address VA&lt;31:22&gt; of the input virtual address in a page directory entry address accumulator 112. The address in page directory entry address accumulator 112 is used to address page directory 104 to obtain the starting address of page table 100. The starting address of page table 100 is then added to the page table address VA&lt;21:12&gt; of the input virtual address in a page table entry address accumulator 116, and the resulting address is used to address page table 100. An address field in the addressed page table entry gives the starting location of the page in main memory 34 corresponding to the input virtual address, and a page fault field PF indicates whether the page is actually present in main memory 34. The location of data within each page is typically specified by the 12 lower-order displacement bits of the virtual address.
When an instruction uses data that is not currently stored in main memory 34, a page fault occurs, and the faulting instruction abnormally terminates. Thereafter, data transfer unit 42 must find an unused 4 KB portion of memory in main memory 34, transfer the requested page from mass storage device 30 into main memory 34, and make the appropriate update to the page table (indicating both the presence and location of the page in memory). The program then may be restarted.
FIG. 4 is a block diagram showing how virtual addresses are translated in the computing system shown in FIG. 1. Components which remain the same as FIGS. 1 and 3 retain their original numbering. An address register 154 receives an input virtual address which references data used by an instruction issued to one of instruction pipelines 14A-H, a translation memory (e.g., a translation lookaside buffer (TLB)) 158 and comparator 170 for initially determining whether data requested by the input virtual address resides in main memory 34, and a dynamic translation unit (DTU) 162 for accessing page tables in main memory 34. Bits VA[18:12] of the input virtual address are communicated to TLB 158 over a communication path 166, bits VA[31:12] of the input virtual address are communicated to DTU 162 over a communication path 174, and bits VA[31:19] are communicated to comparator 170 over a communication path 176.
TLB 158 includes a plurality of addressable storage locations 178 that are addressed by bits VA[18:12] of the input virtual address. Each storage location stores a virtual address tag (VAT) 180, a real address (RA) 182 corresponding to the virtual address tag, and control information (CNTRL) 184. How much control information is included depends on the particular design and may include, for example, access protection flags, dirty flags, referenced flags, etc.
The addressed virtual address tag is communicated to comparator 170 over a communication path 186, and the addressed real address is output on a communication path 188. Comparator 170 compares the virtual address tag with bits VA[31:22] of the input virtual address. If they match (a TLB hit), then the real address output on communication path 188 is compared with a real address tag (not shown) of a selected line in cache memory 60 to determine if the requested data is in the cache memory (a cache hit). An example of this procedure is discussed in U.S. Pat. No. 4,933,835 issued to Howard G. Sachs, et al. and incorporated herein by reference. If there is a cache hit, then the pipelines may continue to run at their highest sustainable speed. If the requested data is not in cache memory 60, then the real address bits on communication path 188 are combined with bits [11:0] of the input virtual address and used to obtain the requested data from main memory 34.
If the virtual address tag did not match bits VA[31:19] of the input virtual address, then comparator 170 provides a miss signal on a communication path 190 to DTU 162. The miss signal indicates that the requested data is not currently stored in main memory 34, or else the data is in fact present in main memory 34 but the corresponding entry in TLB 158 has been deleted.
When the miss signal is generated, DTU 162 accesses the page tables in main memory 34 to determine whether in fact the requested data is currently stored in main memory 34. If not, then DTU 162 instructs data transfer unit 42 through a communication path 194 to fetch the page containing the requested data from mass storage device 30. In any event, TLB 158 is updated through a communication path 196, and instruction issuing resumes.
TLB 158 has multiple ports to accommodate the addresses from the pipelines needing address translation services. For example, if two load instruction pipelines and one store instruction pipeline are used in computing system 10, then TLB 158 has three ports, and the single memory array in TLB 158 is used to service all address translation requests.
As noted above, new virtual-to-real address translation information is stored in TLB 158 whenever a miss signal is generated by comparator 170. The new translation information typically replaces the oldest and least used entry presently stored in TLB 158. While this mode of operation is ordinarily desirable, it may have disadvantages when a single memory array is used to service address translation requests from multiple pipelines. For example, if each pipeline refers to different areas of memory each time an address is to be translated, then the translation information stored in TLB 158 for one pipeline may not get very old before it is replaced by the translation information obtained by DTU 162 for the same or another pipeline at a later time. This increases the chance that DTU 162 will have to be activated more often, which degrades performance. The effect is particularly severe and counterproductive when a first pipeline repeatedly refers to the same general area of memory, but the translation information is replaced by the other pipelines between accesses by the first pipeline.