In various computer systems, peripheral devices communicate over a network fabric such as a PCI or PCI-Express (PCIe) bus. Such peripheral devices may include, for example, a solid state drive (SSD) and various accelerator modules such as a graphics processing unit (GPU). Methods for directly accessing a local memory of a device are known in the art. For example, U.S. Pat. No. 7,623,134, whose disclosure is incorporated herein by reference, describes a technique for processing address page requests in a GPU system that is implementing a virtual memory model. A hardware-based page fault manager included in the GPU system intercepts page faults otherwise processed by a software-based page fault manager executing on a host CPU. The hardware-based page fault manager in the GPU includes a DMA engine capable of reading and writing pages between the system memory and a frame buffer memory in the GPU without involving the CPU or operating system.
As another example, U.S. Patent Application Publication 2014/0055467, whose disclosure is incorporated herein by reference, describes a system that may include a Graphics Processing Unit (GPU) and a Field Programmable Gate Array (FPGA). The system may further include a bus interface that is external to the FPGA, and that is configured to transfer data directly between the GPU and the FPGA without storing the data in a memory of a central processing unit (CPU) as an intermediary operation.
U.S. Patent Application Publication 2014/0075060, whose disclosure is incorporated herein by reference, proposes techniques for demand paging for an IO device (e.g., a GPU) that utilize pre-fetch and pre-back notification event signaling to reduce latency associated with demand paging. Page faults are limited by performing the demand paging operations prior to the IO device actually requesting unbacked memory.
Technologies that enable direct communication between remote GPUs include, for example, PeerDirect® and GPUDirect® RDMA, as presented, for example in a presentation titled “Interconnect Your Future,” by Gilad Shainer, at the 2nd Annual MVAPICH User Group (MUG) Meeting, August, 2014, whose disclosure is incorporated herein by reference.
In some applications, a PCIe device operates in accordance with an internal address space that is different from the PCIe address space, and therefore requires address translation between the two address spaces. A protocol that defines transactions for address translation services is specified, for example, in an extension to the PCIe specifications, titled “Address Translation Services,” revision 1.1, Jan. 26, 2009, whose disclosure is incorporated herein by reference.
Methods for memory management including address translation are known in the art. For example, U.S. Pat. No. 7,225,287, whose disclosure is incorporated herein by reference, describes a system for addressing bus components comprising a bus controller component that controls access between a CPU and a memory address space. A plurality of bus components connected to the bus controller over a bus, are addressable via a memory mapped address within the address space. An address translation table is stored on at least one of the plurality of bus components. The bus translation table stores a translation between a virtual address and a real address.
As another example, U.S. Pat. No. 8,650,342, whose disclosure is incorporated herein by reference, describes virtualization of I/O devices to support operation of plural virtual machines on a host information handling system. The virtualization is managed with distributed translation agents that translate addresses generated from I/O devices according to mapping defined by a virtual machine monitor. The translation agents reside in the host I/O subsystem, such as at I/O hubs or at I/O devices. A discovery module discovers and configures plural translation agents to coordinate I/O device communications with translation of physical memory addresses and virtual I/O addresses.