1. Field of the Invention
The invention relates to memory management systems for computers and more particularly to translation lookaside buffers.
2. Description of the Relevant Art
There are several modern computer architectures having operating systems that allow multiple processes to execute simultaneously. Since it would be too expensive to dedicate a full-address-space worth of memory for each process, especially since many processes use only a small part of their address space, there must be a means of sharing a smaller amount of physical memory between many processes. One way to do this, virtual memory, divides physical memory into blocks and allocates them to different processes.
While virtual memory is common for current computers, program sharing is not the reason virtual memory was invented. In former days if a program became too large for physical memory, it was up to the programmer to make it fit. Programmers divided programs into pieces and then identified the pieces that were mutually exclusive. These pieces, or "overlays", were loaded or unloaded under user program control during execution, with the programmer ensuring that the program never attempted to access more physical main memory than in the machine. This responsibility eroded programmer productivity. Virtual memory, which was invented to relieve programmers of this burden, automatically managed the two levels of the memory hierarchy represented by main memory and secondary storage.
The allocation of main memory to multiple processes is performed by a combination of hardware and software. To efficiently support multiple processes, a virtual memory system includes a memory management scheme that allows several processes to share main memory at the same time. The memory management scheme maps processes into separate areas of main memory and protects processes from interfering with one another.
In many systems, the memory management scheme is implemented by a part of the operating system kernel called memory management software and by a block of hardware called the memory management unit (MMU). The memory management software decides where to place a process in main memory and the MMU actually maps the process into the area of memory allocated to it.
Commonly, the memory management software allocates memory to processes in pages of memory, not byte by byte. Both virtual address space and physical address space are divided into pages. The use of pages limits memory management overhead and provides an efficient use of main memory. For example, a process's main memory allocation need not be contiguous; processes in main memory can be interleaved. In addition, although few processes completely fill the last page allocated to them, the amount of memory wasted is small compared to the overall size of physical memory.
FIG. 1 shows an example of how this scheme is used in a computer system to map the virtual address space of two processes into the physical address space of main memory. In the figure, the page numbers for Processes 1 and 2 are virtual page numbers, and the page numbers in physical address space are physical page numbers. Note that contiguous virtual pages of the two processes are not necessarily contiguous as they are mapped into physical address space.
The MMU of the computer system maps a process into main memory by translating the virtual addresses generated by the process into the physical addresses that correspond to the pages of main memory allocated to the process. Since translation is done on a page basis, it is useful to divide both virtual and physical addresses into a page offset and a page number. For example, as shown in FIG. 2, the low-order n+1 bits of each address comprise the page offset, which specifies a particular byte within a page. The high-order m-n bits of the virtual address comprise the virtual page number. The virtual address comprises a total of m+1 bits. The high-order p-n bits of the physical address comprise the physical page number. The physical address comprises a total of p+1 bits. Translating a virtual address to a physical address is simply a matter of replacing the virtual page number with the physical page number.
When memory management software allocates main memory to a process, the software determines the physical page number for each of the process's virtual pages and stores the physical page numbers in tables in main memory. When an address is presented to the MMU for translation, the MMU uses the virtual address to fetch the appropriate physical page number out of the tables in main memory. Having fetched the physical page number, the MMU replaces the virtual page number with the physical page number. The MMU does not change the page offset since the addressed byte is in the same relative position in both the virtual and physical page.
Memory management software stores the physical page number for each virtual page in a translation table in main memory. Although a single table could be used to map the system's entire virtual address space into main memory, such a table for many situations would be excessively large. For example, if the virtual address space is 32 bits wide, the table would have over one million entries. In addition, every active process must have its own table. Since most processes do not need the entire virtual address space available to them, most of the entries in each process's table would be marked as unused. Maintaining a one-million entry table for each active process would impose a significant overhead burden on both memory management software and hardware resources.
This memory management problem is solved by dividing the virtual memory for each process into a hierarchy of memory parcels having various sizes. For example, FIG. 3 illustrates one way the virtual memory can be divided into a hierarchy of memory parcels. In this example, memory parcels of four different sizes are possible. The largest memory parcels are labelled contexts, the second largest parcels are labelled regions, the third largest parcels are labelled segments, and finally, the smallest parcels are labelled pages. The memory parcels are hierarchical because a process's area in main memory (its context) contains multiple regions, a region contains multiple segments, and a segment contains multiple pages. Taken together, these areas comprise the context in which the process runs. There is a context for each active process, and the operating system software maps each process to a particular context.
This hierarchical memory scheme greatly simplifies memory management by reducing the size of translation tables and, therefore, the overhead required to set them up and maintain them. For example, since a process rarely requires all the available virtual address space, memory management software can simply flag entire unused regions and segments as invalid; thus the software does not have to mark a million translation table entries individually (one per page) as invalid.
This hierarchical scheme has another valuable benefit. If a process executes more efficiently using a larger page size, larger-sized memory parcels (i.e., regions or segments) having wider bit ranges can be allocated as the smallest-sized memory parcels utilized, thereby effectively enlarging the page size.
The size chosen for each page determines the size of the translation table and the amount of wasted storage space. The size of the translation table is inversely proportional to the page size, and hence memory can be saved by making pages bigger. In addition, for processes that are smaller, larger page sizes lengthen the time to invoke the process. Finally, transferring larger pages to or from secondary storage is more efficient than transferring smaller pages.
On the other hand, a small page size results in less wasted storage when a continuous region of virtual memory is not equal in size to a multiple of the page size. The portion of unused memory in a page is called internal fragmentation.
To implement the hierarchical approach for dividing virtual memory, the memory management scheme employs, for example, four levels of tables--the Context Table, Region Tables, Segment Tables, and Page Tables. FIG. 4 illustrates the four levels of tables implemented in a system having a 32-bit wide virtual address, although the approach may be similarly implemented for systems having larger or smaller virtual addresses. There is a single Context Table which contains 256 entries. Each entry represents a potential context. Active entries (ones that are being used) point to the Region Table for that context. Similarly, the Region Table contains 256 entries. Each active entry represents one active region in the context, and points to the Segment Table for that region. Similarly, there is a Segment Table for each region.
In the same fashion, the Segment Table holds 64 entries, one per segment, each entry pointing to the Page Table for that segment. Finally, each Page Table contains 64 entries. However, each active Page Table entry contains the physical page number of the addressed page, so the entry points to that page in physical memory, and not to another table. The process of translating virtual addresses to physical addresses for such a hierarchical approach is referred to as "table walking".
The Context, Region, Segment, and Page tables are each stored within a main memory. A disadvantage of using the hierarchical tables is the length of time required to table walk.
To increase the translation speed, therefore, computers use a cache dedicated to virtual address translations, called a translation lookaside buffer (TLB) or simply a translation buffer. A TLB is a high-speed cache which contains a small portion of the virtual-to-physical address translations found in the hierarchical tables. When the processor requests access to a virtual address, the bits constituting the virtual page number are received by the TLB. The TLB contains circuitry to determine whether a virtual page entry corresponding to the virtual page number received is stored in the TLB. If the TLB contains the virtual page entry, the physical page number corresponding to that virtual page is provided at an output bus of the TLB. The physical page number is combined with the bits constituting the page offset, resulting in the overall physical address location to be accessed. The event explained above occurring when the TLB contains the virtual page entry to be accessed is referred to as a TLB "hit".
If the TLB does not contain the virtual page entry corresponding to the virtual page number requested by the processor, the virtual page entry is loaded from the translation tables in main memory into the TLB, increasing the translation time due to table walking. The physical page number corresponding to the virtual page entry is output from main memory and loaded into a portion of memory within the TLB. The bits constituting the physical page number are combined with the page offset bits, resulting in the physical address location to be accessed. The event occurring when the TLB initially does not contain the virtual page entry to be accessed is referred to as a "miss".
To further increase translation speed, the translation lookaside buffer may be designed to implement a least-recently-used algorithm wherein recently used translation entries are stored in the TLB. Since there is a probability that recently used translations will be needed repetitively over a short period of time, the MMU can translate the recently used virtual addresses without table walking and hence translation time is reduced. When the MMU does not find a particular translation entry in the TLB, the MMU fetches the virtual to physical translation entry from the appropriate table in main memory by table walking. This virtual-to-physical translation entry is, in turn, stored within the TLB. For a least-recently-used (LRU) algorithm, the MMU discards the TLB's oldest or unused (flushed) translation entry and replaces it with the new one.
It is desirable to provide a translation lookaside buffer which on average has a high hit rate. It is further desirable to provide a translation lookaside buffer requiring minimal space, having a low power consumption, and capable of operating at high speeds.