Field of the Invention
The present invention generally relates to computer science and, more specifically, to a microcontroller for a memory management unit.
Description of the Related Art
A typical computer system includes a central processing unit (CPU) and one or more parallel processing units (GPUs). Some advanced computer systems implement a unified virtual memory architecture common to both the CPU and the GPUs. Among other things, the architecture enables the CPU and the GPUs to access a physical memory location using a common (e.g., the same) virtual memory address, regardless of whether the physical memory location is within system memory or memory local to the GPU.
In operation, a software process executing on a GPU accesses data stored in physical memory via a virtual memory address. To execute the memory access, the GPU memory management unit (MMU) attempts to translate the virtual memory address to a physical memory address. If the translation is successful, then the GPU uses the physical address to access the data stored in a physical memory. However, in some cases, the memory translation is not successful. For example, the GPU may not have the necessary mapping or permissions to access the physical memory. In such scenarios, the GPU MMU will generate a page fault. A page fault may be fatal or non-fatal. If a page fault is non-fatal, then actions may be taken to map the virtual memory access to an appropriate location in physical memory, thereby remedying the page fault. Notably, the efficiency with which a system remedies page faults may impact the execution speed of software processes.
In one approach to remedying a page fault, the GPU interrupts the CPU, and the CPU executes an appropriate “page fault sequence” designed to make the requested memory page available to the GPU. The page fault sequence generally maps the memory page associated with the requested virtual memory address or changes the types of accesses permitted (e.g., read access, write access, etc). One drawback to this approach is that the interrupt response time may be many microseconds, resulting in long stalls in the faulting GPU process. These stalls may increase the execution time of the GPU process and thus reduce overall system efficiency. In addition, the CPU has limited resources and handling page faults generated by the GPU reduces the resources that the CPU may use to perform other operations. This further contributes to inefficiencies in system operation and therefore undermines overall system performance.
In another approach to remedying page faults, the CPU polls for page faults generated by the GPU. Upon detecting a page fault, the CPU executes an appropriate page fault sequence to make the requested memory page available to the GPU. While this approach may reduce the response time of the CPU to memory faults generated by the GPU, this approach does not necessarily remove undesirable fault handling latency. Further, this approach does not address the reduced efficiency of the CPU attributable to handling GPU page faults instead of performing other operations.
As the foregoing illustrates, what is needed in the art is a more efficient approach to remedying page faults in a universal memory architecture.