Field of the Invention
The present invention generally relates to computer science and, more specifically, to replaying memory transactions while resolving memory access faults.
Description of the Related Art
A typical computer system includes a central processing unit (CPU) and a parallel processing unit (PPU). As software applications execute on the computer system, the CPU and the PPU perform memory operations to store and retrieve data in physical memory locations. Some advanced computer systems implement a unified virtual memory architecture (UVM) common to both the CPU and the PPU. Among other things, the architecture enables the CPU and the PPU to access a physical memory location using a common (e.g., the same) virtual memory address, regardless of whether the physical memory location is within system memory or memory local to the PPU (PPU memory).
Computer systems typically include memory management functions to facilitate virtual memory and paging operations. During the course of normal operation, an instruction may request access to a virtual address associated with a page of data that is paged out, resulting in an access fault. In response to the access fault, conventional processing units may complete instructions preceding the faulting instruction, and cancel the faulting instruction along with all instructions that began execution subsequent to the faulting instruction. At this point, an access fault handler pages-in the requested page of data and re-starts execution beginning with the faulting instruction. In some cases, the access fault handler may require a significant amount of time to complete relative to typical instruction execution time. In particular, if the computer system implements a unified virtual memory architecture, then the access fault handler may perform lengthy faulting procedures that migrate memory pages between system and memory local to the PPU.
In a highly-parallel, multithreaded, advanced PPU, hundreds or many thousands of memory transactions, and therefore many address translations, may be outstanding at any moment. Consequently, numerous memory access faults may be active at any moment. If a PPU were to implement a conventional instruction-cancellation fault handing technique, then the PPU would frequently cancel thousands of instructions over all execution units. Further, the PPU would wait for lengthy access fault handling procedures to load paged out data for each faulting instruction within each executing thread. Such latencies would significantly, and often unacceptably, degrade overall system performance.
As the foregoing illustrates, what is needed in the art is a more effective approach to handling access faults involving a multithreaded processing unit.