1. Field of the Invention
The present invention relates generally to the design of a virtual memory management system and, more particularly, to a software assisted hardware Translation Lookaside Buffer (TLB) miss handler which reduces the TLB miss penalty associated with access to a memory system.
2. Discussion of Related Art
Conventional computer processing systems use a technique called virtual memory which simulates more memory than actually exists and allows the computer to run several programs concurrently regardless of their size. Concurrent user programs access main memory addresses via virtual addresses assigned by the operating system. The mapping of the virtual addresses to the main memory or the physical addresses is a process known as virtual memory translation. Virtual memory translation can be accomplished by any number of techniques so that the processor can access the desired information in the main memory.
Addresses, physical or virtual, consist of a page number and a byte position within the page. For main memory access, the page number needs to be translated from virtual to physical (real) address space; the position of the byte within the page is the same for both virtual and physical addresses.
Translations for all pages in memory are often stored in a memory structure called a page directory (PDIR) or page table. Page tables can be organized in a number of structures. "Forward-mapped" tables are most easily accessed using the virtual page number as a pointer to the table entry containing the translation. "Reverse-mapped" tables are most easily accessed using the physical page number as a pointer to the table entry containing the translation. Since there are many more possible virtual page numbers than physical page numbers, forward-mapped tables can be very large and sparse, but fairly easily searched given the virtual page number. A reverse-mapped table contains one entry for each page of physical memory. Since there are a limited number of physical pages, compared to virtual pages, reverse-mapped tables tend to be more efficient storage structures, but more difficult to access given only the virtual page number.
Each page table entry typically contains the virtual address and/or the physical address, and protection and status information concerning the page. Status typically includes information about the type of accesses the page has undergone. Examples are a reference bit, which identifies the first access to data in the page, and a dirty bit which identifies the first modification to data in the page.
Page tables are usually so large that they are stored in the main memory. Thus, each regular memory access can actually require two accesses, one to obtain the translation and a second to access the memory location.
Many computer systems that support virtual memory translation use a hardware structure called a translation lookaside buffer (TLB). The TLB is a small, fast, associative memory which is usually situated on or in close proximity to the processor unit and stores recently used pairs of virtual and physical addresses. The TLB contains a subset of the translations in the page table and can be accessed much more quickly. When the processing unit needs information from main memory, it sends the virtual address to the TLB. The TLB accepts the virtual page address and returns a physical page address. The physical page address is recombined with the byte position and used to access main memory. However, since access to the main memory is often quite time consuming, many computer systems employ a cache memory for interfacing the main memory to the processor.
Cache memories are high-speed memories that are placed between microprocessors and main memories. They store copies of main memory that are currently in use in order to speed microprocessor access to requested data and instructions. Caches appear today in every class of computer and in some computers more than once. Cache systems typically include a dam cache (D-cache) and an instruction cache (I-cache). In order to achieve the speed necessary to aid in microprocessor performance, cache memories are typically built using fast static random access memory circuits (SRAMs). Cache memories provide rapid access to frequently used instructions and data. When properly implemented, a cache memory can typically have an access time which is three to twenty times faster than that of main memory, thus reducing the overall access time. The main advantage of using a cache is that a larger, relatively slow main memory can be made to emulate the high speeds of a cache. For a more in depth discussion of cache memory design and operation see Alan J. Smith, Cache Memory Design: An Evolving Art, IEEE Spectrum, pp. 40-44 (December 1987) and Hennessy et al., Computer Architecture a Quantitative Approach, Morgan Kaufmann Publishers (1990), both of which are incorporated by reference in their entirety.
Cache memories may be organized for access using either virtual or physical addresses. Many physical addressed caches depend upon the TLB to supply the physical address translation before beginning the data access. Another approach is to access the cache with the byte position address in parallel with the TLB translation and compare the physical page address from the TLB with a physical page address tag stored with the cache data.
Since the TLB has limited capacity--typically anywhere from 4 to 512 entries--it may not hold the match for a given virtual address. When a virtual page address translation is not found in the TLB, a TLB miss occurs. When this happens, the TLB refers to the page table. The real address from the page table is sent to the TLB, which retains a copy of it for possible reuse and forwards the real address to the cache. When the TLB is full it discards an old address translation to make room for the new one. Accessing the page table is much slower than the TLB, and using it adds time to the information retrieval process.
Servicing a TLB miss involves calculating if and where the correct translation lies in the page table. The page table is searched and if the translation is found, it must be inserted into the TLB along with other information associated with the page before normal program execution may continue. Often, the page table entry must be modified to update status information concerning the page. Page table entries may be organized to reflect the most recent page access patterns, thus lessening the search time associated with the TLB miss penalties that occur later in time. If the translation is not found, then the page may be absent from memory (called a `page fault`). Virtual memory management software must step in to recover a page fault. When the missing page is brought into memory from disc, the page table entry corresponding to the new physical page must be updated with the new translation, protection, and status information.
The time required to service the TLB miss is called the TLB miss penalty, because normal program execution is suspended while searching for the virtual address translation. If the TLB miss penalty is lengthy and TLB misses are frequent, user programs suffer degraded performance.
Sophisticated software routines called software (SW) TLB miss handlers are typically used to service TLB misses and manage the page table. Many operating systems have their own specific virtual memory management schemes tuned to a set of expected user applications. Each may organize and manage the page tables differently. Software provides the flexibility to meet these requirements, but often at the expense of increasing the TLB miss penalty and decreasing application performance. In particular, SW TLB miss handlers are not very efficient at performing the most frequent task of servicing simple TLB misses. In other words, TLB misses that do not require a complete traversal of the page table or any page table management.
Most conventional computer architectures use only a single level TLB. However, some architectures have increased performance by implementing a second level TLB in the cache data array. One such design is the MIPS RC6280 CPU. Second level TLBs are hardware units which typically store many more entries than the primary TLB, but have slower access time (though not as slow as main memory). The second level TLB is usually implemented outside of the CPU chip either in memory units separate from the cache or in a reserved portion of the cache memory not used for cache data lines or tags. If the primary TLB does not contain the virtual address translation, then the secondary TLB is checked. If neither TLB contains the translation, a TLB miss is signaled and software retrieves the translation from a physical page directory in memory.
Second level TLB implementations which require additional memory or reserved memory in the cache or tag RAMS to hold translation increase the cost of the cache system as a whole. Accessing this reserved memory requires additional address pads/pins on the processor chip which generally increases the cost of the chip and/or precluded the pads/pins from use for other functions. Moreover, additional processor control functions have to be implemented to manage the second level TLB accesses.