Embodiments presented herein generally relate to FPGA-based coherent accelerators, and more specifically, to efficient translation reloads for page faults occurring on computing systems with FPGA-based coherent accelerators.
Some FPGA-based hardware accelerators provide an application direct access to the hardware accelerator. For example, an FPGA-based coherent accelerator allows an application to execute SCSI commands in an application's memory (i.e., user space) directly. In contrast to conventional hardware accelerators that need physical addresses to execute bus commands, coherent accelerators use effective addresses to issue bus commands to an attached storage device. As a result, an operating system does not need setup actions that are typical (and computationally expensive) to a conventional hardware accelerator, such as translating effective addresses to physical addresses, which requires steps such as pinning memory pages to ensure the physical pages are not deleted and reassigned to another virtual address. A coherent accelerator translates effective addresses to real addresses while accelerating a function. Therefore, the operating system, via the coherent accelerator, allows page faults to occur, handling the page faults such that the accelerator may continue to access application memory. This approach greatly reduces the number of instructions required to set up a DMA path for data transfer. Further, coherent accelerators allow developers to customize applications to more efficiently use the FPGA.
To access the coherent accelerator, an application attaches application memory to a hardware context of the coherent accelerator. A hardware context may include a page table that maps application memory to physical pages. Further, a hardware context may include a segment table for processors that have a segmented architecture, which specify which virtual pages belong to a given segment.
Coherent accelerators typically cannot create page translation entries. So if a page translation entry for a requested address is not found, the coherent accelerator must have the operating system create one by sending an external interrupt to one of the system's processors. On conventional systems, page faults generate synchronous exceptions which arrive in the context of a process as a result of a memory access instruction (e.g., loads, stores, etc.). Therefore, the interrupts generated on such systems responsive to page faults are synchronous interrupts. However, coherent accelerators may generate asynchronous interrupts, as the system processor may receive an interrupt from the coherent accelerator related to a process that is not currently executing on that processor.
Furthermore, the page fault interrupt handling environment is a very restrictive context that has limited access to system memory, and no access to the data structures of the process causing the page fault to determine whether the physical address is still valid. Further complicating matters is the potential address space shrinkage being performed in parallel on other CPUs (i.e., by the owning process that created the page fault) that may invalidate the translation the coherent accelerator is requesting.