A processor implemented in an integrated circuit (IC) may include a processor core having a native architecture. The processor core has an instruction set, and each instruction in the set has a predefined latency that determines how many stages of the processor core's pipeline are involved in the execution of the instruction.
If the instruction set does not include an instruction for a particular operation, then executing that particular operation will require the execution of two or more instructions. Consequently, software involving that operation will run slower than it would if that operation was included in the native architecture.
In order to “accelerate” the particular operation, or to execute functions that the processor core cannot perform, a dedicated hardware accelerator may be designed and implemented in the integrated circuit. In general, there is some overhead involved in using the accelerator. For example, the use of the accelerator may involve setting modes, configuring the accelerator, and creating triggers. In another example, the accelerator typically writes either to internal registers of the accelerator or to shared memory. The processor core must then read from the shared memory to access the results of the operations performed by the accelerator. The overhead may diminish the benefits of using the accelerator.
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity.