As processor-based systems advance, the availability of programmable accelerators connected to the system via a high speed peripheral interconnect such as a Peripheral Component Interconnect Express (PCIe™) interconnect in accordance with links based on the PCI Express™ Specification Base Specification version 2.0 (published Jan. 17, 2007) (hereafter the PCIe™ Specification) or another such protocol, allows system integrators to pack more computational horsepower into a system. However, challenges exist in ensuring that an application can transparently utilize the additional compute horsepower without making significant changes to the application to manually split the computation between a main processor (e.g., a multicore central processing unit (CPU)) and the accelerator(s) and manage movement of data back and forth. Traditionally, only the main system memory that is managed by the operating system (OS) is allocated for applications to use. The physical memory that is local to any accelerator coupled via a peripheral interconnect is managed separately. In particular, such local memory on the accelerator is not visible as part of the system memory recognizable by the OS running on the main processor. Instead, device driver software is responsible to explicitly manage data movement between local memory and remote memory.
The physical memory that is accessed by the processor is managed by the operating system which virtualizes access to this physical memory to create an illusion of a contiguous large virtual address space. The OS uses underlying processor support for virtual memory management, as the processor allows the software to set up a mapping table to map virtual pages to physical pages. The processor supports virtual memory address translation by consulting the mapping table every time a memory access needs to be made. Frequently accessed translations can be cached by the processor to speed up this process. These mapping tables, commonly referred to as page tables, also contain attribute bits such as read/write and user/supervisor privilege bits that control access to a given virtual page. While the OS manages the physical memory available on the motherboard (the system memory), it does not manage or allocate memory that is local to and available on an accelerator. Thus current solutions create a shared memory model as seen by the programmer and depend on memory protection mechanisms to fault and move the pages back and forth between different memories.