Shared virtual memory (SVM) across a central processing unit (CPU) and graphical processing unit (GPU) is a feature of accelerated processing units. The input output memory management unit (IOMMU) enables a shared virtual memory of GPUs by servicing the address translation requests of the GPU. The IOMMU accesses the same page table structures utilized by processes running on the CPU. Therefore, the GPU and CPU are able to share the same set of page tables, and thereby the same virtual address space, via the IOMMU.
The GPU executes single instruction multiple data (SIMD) instructions. The IOMMU receives the address translation requests. Each memory access (load/store) request has the potential to issue several concurrent memory requests, up to the length of a wavefront, where the wavefront is a batch of threads that execute in lockstep and together execute SIMD instructions. For example, each work item (or thread) of the wavefront can request an address that belongs to a different OS page. When the translation of the requests is not found in the GPU translation lookaside buffers (TLBs), the translation is forwarded to the IOMMU. The IOMMU performs a translation lookup in its own TLBs, and when it is a miss, a page table walk is required. Thus, with SIMD instructions having many threads that may potentially access different memory pages, timely completion of the instructions is difficult.
In the following description, the use of the same reference numerals in different drawings indicates similar or identical items. Unless otherwise noted, the word “coupled” and its associated verb forms include both direct connection and indirect electrical connection by means known in the art, and unless otherwise noted any description of direct connection implies alternate embodiments using suitable forms of indirect electrical connection as well. Additionally, the terms remap and migrate, and variations thereof, are utilized interchangeably as a descriptive term for relocating.