The invention relates generally to computer graphics apparatus and processing and more specifically to memory management for computer graphics apparatus and processing.
Traditional computing architectures had only one client, a central processing unit (CPU), accessing memory through a translation lookaside buffer (TLB). The TLB provides quick translations of virtual addresses used by the CPU to physical addresses used by memory for a limited number of such addresses. However, it is desirable to allow more than one client to access memory through a TLB. This provides all clients with the same view of the virtual address space. For example, it is desirable to allow a CPU and an advanced graphics port (AGP) device to access memory through a TLB. The AGP device may perform graphics processing while the CPU performs other processing. The multiple clients may include separate devices or a CPU switching from one process to another, for example, a video process, an audio process, and a user application process, etc.
One problem that arises is that least-recently-used (LRU) replacement strategies cause important information in the TLB to be replaced as different processes or devices access different areas in virtual address space for which an address translation has not been cached in the TLB. Thus, for example, when audio processing begins, information in the TLB that is important to video processing may be replaced with information important to audio processing, requiring the information important to video processing to be reloaded into the TLB when video processing resumes. Consequently, memory bandwidth is wasted while replacing and reloading information.
Several attempts have been made to avoid this problem. One approach has been to use a TLB large enough to accommodate the information needed by all of the different devices or processes. However, increasing the size of the TLB increases its complexity and cost while reducing its efficiency.
Another approach has been to use pipelining to attempt to solve the problem. Pipelining involves dividing a task, such as the processing of an instruction, into a number of sequential steps and processing different steps of different tasks concurrently. However, pipelining increases complexity and can introduce delays in processing under certain circumstances.
Another approach has been to give each of the multiple clients direct access to system memory. However, since the memory space is not contiguous, each of the multiple clients requires its own TLB structure. Thus, complexity is increased by duplication of the TLB structure.
As noted above, large TLB""s typically suffer from complexity, cost, and efficiency problems. However, if such problems could be avoided, a larger TLB would be beneficial.
Thus, a technique to allow multiple clients to access memory through a TLB and to increase performance of an AGP device is needed.