In central processing units (CPU) of modern computing devices, it is common to improve processing efficiency by means of multi-thread architectures. In multi-thread architectures, accelerators which are used to offload and execute tasks from threads and return results to the threads are often adopted in order to further improve task processing speed. Since accelerators have processing speed far higher than that of the hardware of threads, processing efficiency can be further improved accordingly.
In most of multi-thread architectures, an accelerator is shared by multiple hardware threads for the fact that the number of threads is far beyond the number of accelerators.
FIG. 1 shows an architecture composed of a common hardware accelerator 103 and multiple hardware threads 101. The hardware accelerator 103 comprises a bus interface 1031 and a task accelerating unit 1032. A hardware thread 101 issues a task in the form of a request as shown in FIG. 3 to the hardware accelerator 103. After accelerating the task by the task accelerating unit 1032, its result is returned to a target data cache 1011 associated with the hardware thread.
FIG. 2 shows a particular structure of the hardware accelerator 103. A task loader 10321, in response to a new task, queues the task into a task queuing unit 10322. An accelerator engine 10323 fetches a task for processing from the task queuing unit 10322 according to its queuing order. A result outputting unit 10324 returns a processing result to the target data cache 1011 within the requiring hardware thread.
FIG. 3 is a frame format of a request issued to the hardware accelerator 103 by the hardware thread 1011. An operation code 301 is used to identify the present operation. An accelerator ID 302 is used to identify which hardware accelerator 103 is used for acceleration. A requesting address 303 is used to identify the address of the requesting hardware thread 101.
Since a thread having tasks to be executed in a hardware accelerator has no information about other tasks issued by other threads, nor the status of a task queue in the hardware accelerator, synchronous or asynchronous mechanisms are applied in hardware threads to wait for task completion.
Synchronous mechanisms generally poll on the status of the target data cache 1011 using a polling instruction continuously or detect the target data cache 1011 by a monitoring mechanism on a bus 102, and notify any updated data to the target data cache 1011 once it is detected. The hardware thread 101 does nothing during the period from sending the hardware accelerator 103 a request to obtaining a result from the target data cache 1011, thus lowering resource utilization and raising processing overhead, however, simplifying processing delay control for time-sensitive applications.
In asynchronous mechanisms, the hardware thread 101 executes other tasks during the period from sending the hardware accelerator 103 a request to obtaining a result from the target data cache 1011. The hardware thread 101 is woken up by an interrupt at a later point of time, and obtains a result from the target data cache 1011. Thus, resource utilization can be improved and processing overhead can be lowered, at a cost of complicating the processing delay control of time-sensitive applications.
Regardless of synchronous or asynchronous mechanisms, there is always some inevitable overhead needed to monitor a target data cache (the overhead refers to resources required to monitor the target data cache, and additional actions to be executed). Furthermore, hardware threads are not aware of the time for getting their results, which is disadvantageous for its task planning.