Graphics processing involves a performance of rapid mathematical calculations for image rendering. Such graphics workloads may be performed at a general-purpose microprocessor or a graphics processing unit (GPU), which is a specialized electronic circuit, to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. Often there is a need to accelerate graphics workloads to execute performance critical operating system (OS) kernals.
Various acceleration techniques (e.g., software and fixed function units) are currently implemented at microprocessors and GPUs. However, such techniques have limitations and/or disadvantages. An improvement on these techniques features implementation of a field-programmable gate array (FPGA) to accelerate the execution of performance critical loops in order to free up processing core resources. Such an implementation is an improvement because FPGAs are more performance and power efficient in executing the loops (e.g., they are not limited to a core's data access, instruction set architecture (ISA) and microarchitecture limitations). In this design, a single FPGA is shared by all microprocessors or GPU cores. The shared design continues to have performance deficiencies, however, since the single FPGA is a contended resource that has to service different cores needing access for synchronization.