The ever-increasing capability of computer systems drives a demand for increased memory size and speed. The physical size of memory cannot be unlimited, however, due to several constraints including cost and form factor. In order to achieve the best possible performance with a given amount of memory, systems and methods have been developed for managing available memory. One example of such a system or method is virtual addressing, which allows a computer program to behave as though the computer's memory was larger than the actual physical random access memory (RAM) available. Excess data is stored on hard disk and copied to RAM as required.
Virtual memory is usually much larger than physical memory, making it possible to run application programs for which the total code plus data size is greater than the amount of RAM available. This is known as “demand paged virtual memory”. A page is copied from disk to RAM (“paged in”) when an attempt is made to access it and it is not already present. This paging is performed automatically, typically by collaboration between the central processing unit (CPU), the memory management unit (MMU), and the operating system (OS) kernel. The application program is unaware of virtual memory; it just sees a large address space, only part of which corresponds to physical memory at any instant.
The virtual address space is divided into pages. Each virtual address output by the CPU is split into a (virtual) page number (the most significant bits) and an offset within the page (the N least significant bits). Each page thus contains 2N bytes. The offset is left unchanged and the MMU maps the virtual page number to a physical page number. This is recombined with the offset to give a physical address that indicates a location in physical memory (RAM).
The performance of an application program depends dramatically on how its memory access pattern interacts with the paging scheme. If accesses exhibit a lot of locality of reference, i.e. each access tends to be close to previous accesses, the performance will be better than if accesses are randomly distributed over the program's address space, thus requiring more paging. In a multitasking system, physical memory may contain pages belonging to several programs. Without demand paging, an OS would need to allocate physical memory for the whole of every active program and its data, which would not be very efficient.
Current computer systems, even relatively small scale personal computer systems, include multiple subsystems and/or coprocessors working with the CPU and OS to perform specialized functions. For example, graphics coprocessors (or graphics processing units (GPUs)), floating point coprocessors, networking processors, and other types of coprocessors are required to process very large amounts of data with as much speed as possible and require large amounts of memory. A consistent set of rules necessarily governs access to the physical memory for all of the system elements or subsystems requiring such access. For example, the OS may dictate a page size and page table format to which each subsystem must interface for virtual memory accesses.
In general, the overall performance of a virtual memory/page table translation system is governed by the hit rate in the translation lookaside buffers (TLBs). A TLB is a table that lists the physical address page number associated with each virtual address page number. A TLB is typically used as a level 1 (L1) cache whose tags are based on virtual addresses. The virtual address is presented simultaneously to the TLB and to the cache so that cache access and the virtual-to-physical address translation can proceed in parallel (the translation is done “on the side”). If the requested address is not cached, the physical address is used to locate the data in memory that is outside of the cache. This is termed a cache “miss”. If the address is cached, this is termed a cache “hit”.
A page table in a virtual memory system is an array that contains an entry for each current virtual-to-physical address translation.
A page table entry (PTE) in the page table typically contains a physical page number and flag bits. Pages are of a uniform size and the smaller the page size, the less likely a reference to a particular page will result in a cache hit. Pages can also be combined into contiguous sections of memory known as fragments. A fragment is a contiguous series of physical pages. 100% fragmentation of memory implies one page per fragment. As average fragment size increases, or fragmentation decreases, the hit rate increases markedly.
It is desirable for virtual memory systems accessing a physical memory to employ techniques that increase hit rates. Challenges encountered in the design of such virtual memory systems include the constraints imposed by the memory architecture to which the virtual memory system must interface, including a fixed page size and a dictated page table entry format. It is also desirable for the techniques to result in minimum increased overhead, for example in terms of size and speed. It is desirable for the techniques to work within all of the constraints presented by a given memory architecture and to be transparent to memory clients accessing physical memory through the virtual memory system.