Embodiments presented herein generally relate to FPGA-based coherent accelerators, and more specifically, to sharing kernel context of a FPGA-based coherent accelerator.
Traditional hardware accelerators (e.g., PCI-based accelerators) perform operations requiring direct memory access (DMA) via a stack that includes a number of layers, providing user applications with access to the hardware accelerator. The hardware accelerator directs the call to a physical memory address in a storage device attached to the hardware accelerator. The hardware accelerator sets up DMA to corresponding pages of physical memory for the application (i.e., a buffer for the application). Doing so allows the hardware accelerator to arbitrate on a connected bus (e.g., a PCI bus) to transfer I/O operations and system calls to the storage device via a series of SCSI commands executed in the hardware accelerator.
Some FPGA-based hardware accelerators provide an application with direct access to the hardware accelerator. For example, an FPGA-based coherent accelerator allows an application to execute SCSI commands directly within application memory. In contrast to traditional hardware accelerators that need physical addresses to execute bus commands, coherent accelerators use effective addresses to issue bus commands to an attached storage device. As a result, an operating system does not need to perform actions that are typical (and computationally expensive) to a traditional hardware accelerator, such as translating effective addresses to physical addresses, which requires steps such as pinning memory pages to prevent page faults. A coherent accelerator translates effective addresses to real addresses while accelerating a function. Therefore, the operating system, via the coherent accelerator, allows page faults to occur, handling the page faults such that the accelerator may continue to access application memory. This approach greatly reduces the length of instructions required to set up a DMA path for data transfer. Further, coherent accelerators allow developers to customize applications to more efficiently use the FPGA.
To access the coherent accelerator, an application attaches application memory to a hardware context of the coherent accelerator. A hardware context may include a page table that maps application memory to pages of the page table. Further, a hardware context may include a segment table for processors that have a segmented architecture, which specify which virtual pages belong to a given segment. Generally, a coherent accelerator has a limited amount of hardware contexts. To address this limitation, the coherent accelerator may allow processes to share contexts with one another in either user space or kernel space.