Field of the Invention
The present invention generally relates to computer science and, more specifically, to selective fault-stalling for a GPU memory pipeline in a unified virtual memory system.
Description of the Related Art
A typical computer system includes a central processing unit (CPU) and a parallel processing unit (PPU). Some PPUs are capable of very high performance using a relatively large number of small, parallel execution threads on dedicated programmable hardware processing units. The specialized design of such PPUs usually allows these PPUs to perform certain tasks, such as rendering 3-D scenes, much faster than a CPU. However, the specialized design of these PPUs also limits the types of tasks that the PPU can perform. By contrast, the CPU is typically a more general-purpose processing unit and therefore can perform most tasks. Consequently, the CPU usually executes the overall structure of a software application and then configures the PPU to implement tasks that are amenable to parallel processing.
As software applications execute on the computer system, the CPU and the PPU perform memory operations to store and retrieve data in physical memory locations. Some advanced computer systems implement a unified virtual memory architecture (UVM) common to both the CPU and the PPU. Among other things, the architecture enables the CPU and the PPU to access a physical memory location using a common (e.g., the same) virtual memory address, regardless of whether the physical memory location is within system memory or memory local to the PPU (PPU memory).
Computer systems typically include memory management functions to facilitate virtual memory and paging operations. During the course of normal operation, an instruction may request access to a virtual address associated with a page of data that is paged out, resulting in an access fault. In response to the access fault, conventional processing units may complete instructions preceding the faulting instruction, and cancel the faulting instruction along with all instructions that began execution subsequent to the faulting instruction. At this point, an access fault handler pages-in the requested page of data and re-starts execution beginning with the faulting instruction.
In operation, the access fault handler may require a significant amount of time to complete relative to typical instruction execution time. Notably, if the computer system implements a unified virtual memory architecture, then the access fault handler may perform lengthy faulting procedures that migrate memory pages between system and memory local to the PPU. Since CPUs are configured to generate a very limited number of outstanding memory access requests, access faults are relatively rare. Thus, in CPUs, this instruction-cancellation approach to access faults typically results in a relatively small average impact on overall computer system performance and may be acceptable.
By contrast, in a highly-parallel, multithreaded, advanced PPU, hundreds or many thousands of access requests may be outstanding at any moment and numerous memory access faults may be active at any moment. Therefore, if a PPU were to implement a conventional instruction-cancellation fault handing technique, the PPU would frequently cancel thousands of instructions over all execution units. Further, the PPU would wait for lengthy access fault handling procedures to load paged out data for each faulting instruction within each executing thread. Such waits would significantly, and often unacceptably, degrade overall computer system performance.
As the foregoing illustrates, what is needed in the art is a more effective approach to handling access faults in a unified virtual memory architecture.