1. Field of the Invention
This invention is related to processor-based systems having hardware accelerators.
2. Description of the Related Art
Hardware accelerators are often included in processor-based systems such as computer systems to perform specific, predefined tasks in hardware rather than in software. Work is offloaded from the processors to the hardware accelerators, permitting them to work in parallel on other tasks. Even if no other task is available for the processors to work on, the higher performance of the hardware accelerator performing the defined tasks can still result in a performance increase. For example, if the software execution of the task requires X cycles and the hardware accelerator execution of the task requires Y cycles, where Y is less than X and often much less than X, the performance gain is X/Y (without accounting for software overhead in dispatching the task to the accelerator). Additionally, in some cases, the hardware acceleration can be more power-efficient than performing the same tasks in software. Power efficiency can be even greater if the hardware accelerators are incorporated on the same semiconductor substrate (“on-chip”) as the processors. Particularly, integrating hardware accelerators onto multi-core chips such as chip multiprocessors (CMP) and/or chip multithreaded (CMT) processors can be efficient, because the accelerator can be shared among the cores/threads.
Currently, there is a large amount of software overhead associated with dispatching a task to a shared hardware accelerator (e.g. on the order of tens of thousands of processor clock cycles). Access to the hardware accelerator is typically managed by the lowest-level and most-privileged layer of software in the system. Managing access in this fashion helps ensure that the hardware accelerator is shared in a secure fashion (preventing one thread/core from disrupting, and particularly corrupting, the task issued by another thread/core to the hardware accelerator), and also in a fair fashion so that various threads/cores have the opportunity to take advantage of the hardware accelerator. The OS can implement the fairness and security in a non-virtualized environment. In a virtualized environment, the Hypervisor implements the fairness and security. Typically, the overhead incurred is as follows: the application transmits a task request to the operating system (OS); the OS copies the data to be processed from the user-space to the kernel space; the OS forwards the request to the Hypervisor; the Hypervisor programs the accelerator to perform the task and awaits completion; the Hypervisor passes the completion to the OS; the OS copies the results from kernel space to user-space; and the OS informs the application that the task is complete. Accordingly, much of the overhead is consumed in copying the data back and forth between user space and kernel space, as well as the communications between the OS and the Hypervisor.
The size of the software overhead limits the usefulness of the hardware accelerator to those tasks for which the performance gain of using the hardware accelerator is in the tens of thousands of clock cycles or greater. Since the software overhead is experienced for every task issued to the hardware accelerator, the size of each individual task must be large enough to compensate for the software overhead. Not all tasks meet these requirements. For example, bulk encryption in web servers can be expected to be a large task overall, which could greatly benefit from hardware acceleration. However, each packet to be encrypted is relatively small, and the cost of the software overhead would be prohibitive.