Conventionally, system processing functionalities are written in software for execution in some type of processor to accommodate for future modifications and updates. However, a system functionality executed in software by processor(s) is typically slower than if that same functionality was implemented and executed using accelerators, either as special purpose processors or application specific hardware dedicated to the particular function. Accelerators can increase the performance, decrease the processing latency, and decrease the power consumption of computer systems.
Since accelerators are customized to process only a particular portion of an application, they are often paired with processor(s) in a system to be able to execute the entire application. The part of the application that is compatible with the accelerator is executed by the accelerator. The remaining part is executed by the processor. Traditionally, the accelerator is a slave component for a processor that functions as a master component. The applications run on the processor and for the part of the application that is amenable to acceleration, the processor transfers the control to the accelerator. After finishing the accelerated part of the application, the accelerator returns back the control to the processor.
The conventional acceleration method described above entails a high overhead. First, the input data elements from an input interface must be copied to the processor and then they should be stored in the accelerator. Next, the output data elements (if any) from the accelerator must be copied to the processor and then they should be stored in an output interface. There therefore remains a need for a method and system of implementing an accelerator in conjunction with a processor that overcomes these challenges.