The present invention relates in general to memory management systems and in particular to an address translation system with caching of variable-range translation clusters.
Most modern computer systems generally implement some form of virtual memory, in which processes reference system memory locations using a “virtual” address rather than an address of a specific location in the memory. When a process makes a memory request using a virtual address, the system uses a page table to translate the virtual address to a specific location and accesses that location. The page table is typically implemented in a block of memory that includes an entry for each page (e.g., 4 kilobytes) of the virtual address space; the entry stores a physical address of a corresponding page of a physical address space. Page tables can also be structured hierarchically, so that only a portion of the page table needs to be resident in system memory at all times; the portion held in system memory can be used to locate other portions that may have been swapped out.
Virtual memory has a number of common uses. For example, general-purpose computer systems generally cannot guarantee that a process will receive memory resources in any particular arrangement or at any particular location. Virtual addressing enables processes to treat the memory as if it were arranged in a convenient manner, regardless of how it is actually arranged. As another example, systems that support parallel execution of multiple processes can provide a different virtual memory space for each process. This helps to avoid address conflicts between processes. Virtual addressing can also be used to map storage other than system memory, thereby enabling the system to swap data in and out of the system memory, address data in storage devices other than memory storage (e.g., image files resident on a system disk), and so on.
Within graphics processing subsystems, use of virtual memory has been relatively uncommon. Typically, a graphics processing subsystem is implemented on a plug-in printed circuit card that connects to a system bus, such as a PCI (Peripheral Component Interconnect) or AGP (Accelerated Graphics Port) bus. The card usually includes a graphics processing unit (GPU) that implements graphics functionality (e.g., rasterization, texture blending, etc.) and dedicated graphics memory. This memory is generally managed by the GPU or by a graphics driver program executing on the system central processing unit. The GPU can address graphics memory using either physical addresses or offset values that can be converted to physical addresses by the addition of a constant base address. The GPU (or graphics driver program) can also control the arrangement of physical memory allocations. For instance, a pixel buffer that is to be scanned out to a display device can be arranged to occupy a contiguous block of the graphics memory address space. Elements of graphics processing subsystems, including scanout control logic (or display drivers), graphics driver programs, GPUs, and the like are generally designed to use physical addressing and to rely on particular arrangements and allocations of memory.
As the amount of data (e.g., texture data) needed for graphics processing increases, graphics processing subsystems are beginning to rely on system memory for at least some storage of data (and in some instances command lists, etc.). Such subsystems generally use virtual addressing for system memory, with the required address translation being performed by a component external to the graphics processing subsystem. For instance, the AGP bus includes a Graphics Address Relocation Table (GART) implemented in the host-side chipset. Emerging high-speed bus technologies, such as PCI Express (PCI-E), do not provide GART or any other address translation functionality. As a result, graphics cards configured for such protocols will need to implement their own address translation systems if they are to access system memory.
An alternative to the graphics card is an integrated graphics processor (IGP). An IGP is a graphics processor that is integrated with one or more other system bus components, such as a conventional “north bridge” chip that manages the bus connecting the CPU and the system memory. IGPs are appealing as an inexpensive alternative to graphics cards. Unlike conventional graphics cards, an IGP system usually does not include much (or in some cases any) dedicated graphics memory; instead the IGP relies on system memory, which the IGP can generally access at high speed. The IGP, however, generally does not control the physical arrangement or address mapping of the system memory allocated to it. For example, it is not guaranteed that the pixel buffer will occupy a single contiguous block in the physical address space. Thus, designers of IGPs are faced with the choice of redesigning the co-processor and the associated driver programs to use physical addresses provided by the system or relying on virtual addressing.
Given the level of complexity and sophistication of modern graphics processing, redesigning around (unpredictable) physical addresses is a daunting task, which makes a virtual addressing solution desirable. Unfortunately, in many computer systems, virtual addressing can introduce a significant degree of memory overhead, making this option too slow or resource intensive for graphics processing components such as display systems. For example, a typical display system provides a screen's worth of pixel data (e.g., 1280×1024 pixels at four bytes per pixel, for a total of over 5 MB per screen) from the pixel buffer to a display device at a constant screen refresh rate of about 70 Hz. Virtual address translation for this much data would introduce an additional latency that is potentially long and may be highly variable. Such long or variable delays in receiving pixel data from memory could result in incorrect (or black) pixels, or other undesirable artifacts. In addition, if address translation for scanout or other purposes requires a large number of page table accesses, performance of other system components may be adversely affected (e.g., due to congestion on the bus or in the system memory). Conventional address caching and translation lookaside buffer techniques do not alleviate the problem because it is difficult and expensive to provide an on-chip cache large enough to hold all the page addresses needed for scanout.
Another solution is to maintain a complete page table on the graphics chip, thereby allowing faster access times and/or less variability in latency. This solution, however, becomes impractical for large page table sizes. Still another solution divides the virtual address space into “large” and “small” sections, depending on whether the section is mapped to blocks of contiguous physical addresses that exceed a “large size” threshold of e.g., 32 or 64 KB. Pointers to the physical address blocks for “large” sections are stored on chip, while for “small” sections, a lookup in the complete page table is required to complete the translation. In some cases, the result of the most recent page table lookup for each of some number of translation clients can be stored and re-used until the client requests a virtual address on a different page. Such systems can reduce the number of page table accesses in some situations, but the ability to store only one result per client and the inability to share results can still lead to a large number of page table accesses.
Thus, an improved virtual memory system that reduces the number of page table accesses required to translate a group of virtual addresses would be desirable.