Many modern computer systems run multiple concurrent tasks or processes, each with its own address space. It would be expensive to dedicate a full complement of memory to each task, especially since many processes use only a small part of their address spaces. Rather, virtual memory is used to give each process the appearance of a full address space. This allows a program to run on what appears to be a large, contiguous, physical-memory address space, dedicated entirely to the program. In reality, however, the available physical memory in a virtual memory system is shared between multiple programs or processes. The memory that appears to be large and contiguous is actually smaller and fragmented between multiple programs. Virtual addresses used in a process are translated by a combination of computer hardware and software to addresses of physical memory. This process is called memory mapping or address translation.
Rather than attempting to maintain a translation or mapping for each possible virtual address, virtual memory systems divide virtual and physical memory into blocks. In many systems, these blocks are fixed in size and referred to as sections or pages. The addresses within an individual page all have identical upper-most bits. Thus, a memory address is the concatenation of a page number, corresponding to the upper bits of the address, and a page offset, corresponding to the lower bits of the address.
Data structures are typically maintained in physical memory to translate from virtual page numbers to physical page addresses. These data structures often take the form of conversion tables, normally referred to as page tables. A page table is indexed by a virtual page address or number, and generally has a number of entries corresponding to pages in the virtual address space. Each entry is a mapping of a specific page number or virtual page address to a physical page address.
Virtual-to-physical address translation can consume significant overhead, since every data access requires first accessing a page table to obtain a physical address and then accessing the data itself. To reduce address translation time, computers use a specialized hardware cache dedicated to translations. The cache is referred to as an address translation cache or as a translation lookaside buffer (TLB). A TLB is a fast and small static memory for storing the most frequently referenced entries from the page table. It typically has a fixed number of entries. When processing a memory request, a computer first attempts to find an appropriate address translation in the TLB. If such an address translation is not found, a page table is automatically accessed to retrieve the proper translation. The structure of the page table is predefined for use with a particular type of computer or microprocessor.
FIG. 1 shows a prior art example of a virtual memory system using a TLB and a page table. Each virtual address 12 of a process comprises a virtual page number and a page offset. The page number portion of the virtual address is used to index a TLB 14. Assuming that the TLB contains an entry corresponding to the virtual page number (a situation referred to as a TLB "hit"), the TLB produces a physical page address. The page offset portion of virtual address 12 is concatenated with the physical page address from the TLB, resulting in a full physical address for accessing physical memory 16. If the correct entry is not present in TLB 14 (a situation referred to as a TLB "miss"), an initial reference is made to page tables 18 (residing in physical memory 16) to update TLB 14.
There are two general types of TLBs: fully associative and set associative. If a translation entry can be located anywhere within the translation lookaside buffer, the TLB is said to be fully associative. In order to find the proper translation entry within a fully associative translation lookaside buffer, the computer must examine each and every translation entry. A set associative translation lookaside buffer, on the other hand, uses an indexing function so that any given address translation can be located only in a restricted set of places in the buffer. This reduces the number of translation entries which must be examined by the computer during each memory access.
FIG. 2 illustrates a fully associative TLB 30 having eight lines labeled A through H. Each line contains a single translation entry, comprising an address tag and a corresponding physical page address (PPA). In this example, each tag comprises a page number. Each line contains other information, not shown, such as reference and dirty bits. Translation entries are stored randomly within the lines of TLB 30. To find the correct translation entry, assuming it is present in the TLB, a computer must compare the tag value of each entry with the specified virtual address or page number. This generally requires a hardware comparator associated with each buffer line.
FIG. 3 shows a one-way set associative TLB 32, also having eight lines labeled A through H. A one-way set associative TLB is also referred to as a direct-mapped translation lookaside buffer. Each line again contains a single translation entry, comprising an address tag and a corresponding physical page address. However, the lines are uniquely addressable by a 3-bit line number, ranging from zero to seven. In one-way set associative TLB 32, any individual translation entry, corresponding to a specific page frame number, can be stored only at a single location which is specifically indexed by the 3-bit line number. To determine the line number corresponding to any specific page number, a buffer index is calculated according to the following equation: EQU INDEX=PN modulo LNS
where INDEX is the buffer index, PN is the specified page number, and LNS is the number of lines in the translation lookaside buffer, in this case 8. In this case, the equation given above results in the lower three bits of the page number being used as a buffer index, and as a map to the proper location in TLB 32 in which the corresponding translation entry should be stored. In performing a translation, the computer references only the TLB line having a line number which matches the buffer index. This scheme increases hardware efficiency, since only a single comparison is required.
Each line of one-way set associative TLB 32 includes a tag and a physical memory address, similar to the fully associative TLB described above. However, in this case the tag does not need to include the lower three bits which are used to form the buffer index. The computer compares the specified page frame number (excluding the lower three bits) with the stored page frame number (excluding the lower three bits). If the numbers match, there is a hit and the computer translates the specified virtual address using the stored physical page address. Otherwise, the translation entry is replaced with the proper entry recovered from a page table.
FIG. 4 shows a two-way set associative TLB 34. It is identical to one-way set associative TLB 32 in concept and operation, except that each line has two translation entries. Once the correct line has been determined, using the indexing method described above, the computer must check the tags of two translation entries to determine whether one of them is correct. In general, translation lookaside buffers can be from one-way to n-way associative.
The general memory organization and addressing schemes discussed herein are described more fully in J. Hennessy & D. Patterson, Computer Architecture: A Quantitative Approach (1990), which is hereby incorporated by reference. Refer particularly to chapter 8, entitled "Memory-Hierarchy Design."
Different microprocessors implement their TLBs in different ways. The popular Pentium.RTM. Pro microprocessor (manufactured by Intel.RTM. corporation), for example, utilizes four TLBs: one for small (4K) data pages, one for large (2M and 4M) data pages, one for small (4K) code pages, and one for large (2M and 4M) code pages. These are each 4-way set associative caches that hold the most recently used address translations. As another example, the ARM 610microprocessor (manufactured by Advanced RISC Machines Ltd ) has a single, 32-entry fully-associative TLB, with entries that are replaced using a FIFO (first-in, first-out) algorithm.
Many modern microprocessors such as the Pentium.RTM. Pro and ARM 610microprocessors normally resolve virtual memory addresses by referencing a set or a plurality of page tables rather than a single page table. Specifically, a hierarchy of page tables is maintained for each separate virtual address space, normally corresponding to each process that is executing on the microprocessor. FIG. 5 illustrates such a hierarchy, generally designated by reference numeral 40. Hierarchy 40 includes a page table directory 41 and a plurality of page tables 42.
Page table directory 41 has a plurality of entries that are indexed by uppermost bits 43 of page number 44. Thus, each directory entry corresponds to a range of page numbers. Each directory entry has a "valid" bit 45 that indicates whether the entry has been properly initialized, and a field 46 that references one of page tables 42. Each valid directory entry references a different one of tables 42, and each page table therefore corresponds to a range of page numbers.
In response to a TLB miss, a microprocessor refers to page table directory 41 in order to determine which page table has the desired address translation. The page table is indexed using lowermost bits 47 of page number 44. Each entry in page table 42 contains a "valid/invalid" bit 48, indicating whether the entry has been properly initialized, and a translation entry 49 containing a single address translation.
In many systems, page table entries are initialized whenever corresponding memory is allocated. In other systems, page tables and directories are built or initialized in response to misses. In this context, a miss occurs whenever an unmarked valid/invalid bit is encountered in either a page table or a page table directory. Initially, all valid/invalid bits are unmarked, indicating that the corresponding entries have not been initialized or are invalid for some other reason. In response to a page table or directory miss, a memory fault handler is initiated. The fault handler finds the desired address translation and loads it in the appropriate page table entry. The corresponding valid/invalid bit 48 is marked. In addition, the appropriate page directory entry is initialized to reference the correct page table and the directory entry is marked as valid. Control is then returned to the processor, which attempts again to retrieve the desired address translation from a page table. This time, the attempt is successful because of the steps completed by the fault handler.
The TLB and page table schemes described above are useful and desirable in most situations. One disadvantage, however, is the large amount of memory required for page tables. In some situations, memory is at such a premium that performance would willingly be sacrificed, if possible, in order to avoid the memory requirements of multiple page tables.
In other situations, it is desirable to bypass page tables in order to emulate microprocessors that do not support or require hardware page tables. This also allows operating system code to be more portable between different microprocessors.
Avoiding page tables might also be desirable in order to implement a different or more flexible memory management architecture. That is, it might be desirable to use another type of data structure in place of page tables, such as a "victim" cache, a fully associative cache, software page tables, etc.
Finally, eliminating multiple page tables might allow for more in-depth performance analysis and testing than might otherwise be possible. For example, fine grain working set measurements are possible in a system that implements the invention. The invention will make it possible to determine the TLB miss rate in a microprocessor that otherwise would not allow such a measurement. As another example, bounds checking software can be implemented in conjunction with the page table schemes described herein. The invention could also be useful for emulating the memory system of one microprocessor on another processor.
While there are potential advantages of avoiding multiple page tables in microprocessor systems, there is no easy way to accomplish this in many such systems. Many processors are configured to use the particular table schemes described above, and there is no apparent way to disable their reliance on page directories and associated page tables.