1. Field of the Invention
The present invention is generally directed to virtual memory systems in computer systems.
2. Background Art
The ever-increasing capability of computer systems drives a demand for increased memory size and speed. The physical size of memory cannot be unlimited, however, due to several constraints including cost and form factor. In order to achieve the best possible performance with a given amount of memory, systems and methods have been developed for managing available memory. One example of such a system or method is virtual addressing, which allows a computer program to behave as though the computer's memory was larger than the actual physical random access memory (RAM) available. Excess data is stored on hard disk and copied to RAM as required.
Virtual memory is usually much larger than physical memory, making it possible to run application programs for which the total code plus data size is greater than the amount of RAM available. This is known as “demand paged virtual memory”. A page is copied from disk to RAM (“paged in”) when an attempt is made to access it and it is not already present. This paging is performed automatically, typically by collaboration between the central processing unit (CPU), the memory management unit (MMU), and the operating system (OS) kernel. The application program is unaware of virtual memory; it just sees a large address space, only part of which corresponds to physical memory at any instant.
The virtual address space is divided into pages. Each virtual address output by the CPU is split into a (virtual) page number (the most significant bits) and an offset within the page (the N least significant bits). Each page thus contains 2N bytes. The offset is left unchanged and the MMU maps the virtual page number to a physical page number. This is recombined with the offset to give a physical address that indicates a location in physical memory (RAM).
The performance of an application program depends dramatically on how its memory access pattern interacts with the paging scheme. If accesses exhibit a lot of locality of reference (i.e., each access tends to be close to previous accesses) the performance will be better than if accesses are randomly distributed over the program's address space, thus requiring more paging. In a multitasking system, physical memory may contain pages belonging to several programs. Without demand paging, an OS would need to allocate physical memory for the whole of every active program and its data, which would not be very efficient.
Current computer systems, even relatively small scale personal computer systems, include multiple subsystems and/or coprocessors working with the CPU and OS to perform specialized functions. For example, graphics coprocessors (or graphics processing units (GPUs)), floating point coprocessors, networking processors, and other types of coprocessors are used to process large amounts of data with as much speed as possible and include large amounts of memory. A consistent set of rules governs access to the physical memory for all of the system elements or subsystems requesting such access. For example, the OS may dictate a page size and page table format to which each subsystem must interface for virtual memory accesses.
A page table in a virtual memory system is an array that contains an entry for each current virtual-to-physical address translation. A page table entry (PTE) in the page table typically contains a physical page number and flag bits. Pages are of a uniform size and the smaller the page size, the less likely a reference to a particular page will result in a cache hit. Accessing the page table to perform a virtual memory to physical memory translation can be slow, and may result in latency in the performance of the application program.
To reduce such latencies, many virtual memory systems include a translation lookaside buffer (TLB) and a cache. In general, performance of a virtual memory / page table translation system is based on the hit rate in the TLB. A TLB is a table that lists the physical address page number associated with each virtual address page number. A TLB is typically used as a cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel (the translation is done “on the side”). If the requested address is not cached, the physical address is used to locate the data in memory that is outside of the cache. This is termed a cache “miss.” If the address is cached, this is termed a cache “hit.”
Some virtual memory systems include multi-level cache systems. A multi-level cache system can reduce latencies while achieving a relatively high cache hit rate. Such a multi-level cache system may, for example, include a level one (L1) cache and a level two (L2) cache. The L1 cache provides a small cache that may be checked quickly to determine whether there is a cache hit. Due to its small size, however, the L1 cache typically has a relatively low cache hit rate, but otherwise performs well when there is commonality of reference. In contrast, the L2 cache provides a large cache. Due to its large size, the L2 cache typically has a relatively high cache hit rate but may take a relatively long time to determine whether there is a cache hit. In response to an address request, the L1 cache is checked first for the requested address. If there is a cache miss in the L1 cache, the L2 cache is checked for the requested address. In this way, the L1 cache provides for reduced latency (by enabling fast cache access) and the L2 cache provides for a high cache hit rate (by enabling storage of many page table entries).
However, it is desirable for virtual memory systems accessing a physical memory to employ techniques that increase hit rates. Challenges encountered in the design of such virtual memory systems include the constraints imposed by the memory architecture to which the virtual memory system must interface, including a fixed page size and a dictated page table entry format. It is also desirable for such techniques to be implemented in a multi-level cache system.