1. Field of the Invention
Exemplary embodiments of the present invention relate to hardware accelerators, and more particularly, to control structures for configurable hardware acceleration engines.
2. Description of Background
Normally, computer processing units are sequential, and instructions are executed one by one. Hardware acceleration, which involves the use of hardware components to perform one or more functions more efficiently than in software running on a normal processing system, is one of various techniques used to improve performance. A difference between hardware and software is concurrency, which allows hardware to perform functions faster than software. The hardware that performs the acceleration, when in a separate, supplementing unit from the CPU, is referred to as a hardware accelerator.
Hardware accelerators are typically designed for computationally intensive operations such as, for example, floating point arithmetic, graphics, signal processing, string processing, or encryption. By offloading processor-intensive tasks from the main processor, system performance can be accelerated. Examples of hardware accelerators include cryptographic engines, vector and graphics processors, Discrete Cosine Transform (DCT), Inverse Discrete Cosine Transform (IDCT) functions for MPEG encoders and decoders, and other components providing for acceleration of the execution of instructions for complex operations on processing systems. Depending upon granularity, hardware accelerators can vary from special-purpose units to large functional blocks such as, for example, those designed for motion estimation in MPEG2.
A common configuration for high-volume, embedded hardware systems is the system on a chip (SoC) design, which integrates all components of a processing system into a single circuit or chip to increase data throughput for computationally intensive operations, thereby increasing overall system performance. Typically, SoCs are developed from pre-qualified blocks of hardware elements, together with software drivers that control their operation. In order to simplify the SoC design task, application specific integrated circuit (ASIC) vendors typically offer a menu of pre-defined functional building blocks called “cores” that can be embedded within an SoC. A core is a reusable unit of logic, cell, or chip layout design that is dedicated to a single function, or a limited range or functions, and may be owned and used by a single party alone or licensed to other parties.
Typically, a designer will have a number of available approaches in designing ASIC cores that involve tradeoffs between physical size (usually expressed as the area or cell count consumed on the SoC), performance (usually expressed as the clock rate in cycles per second at which a processor performs its most basic operations), latency (usually expressed as the number of clock cycles or total time elapsed between when a data input is provided to the accelerator and when the input is processed accordingly and the corresponding data output is available from the accelerator), and overall throughput (usually expressed as the number of bits or bytes processed per second). The “optimal” design approach in terms of these tradeoffs is determined by the requirements for a given application. For example, in cost-sensitive consumer applications (such as PDAs, cell phones, etc.), minimizing physical size is typically a primary concern, while for high performance applications (such as transaction processing for banking, real-time video and image processing for Aerospace & Defense systems, etc.), minimizing latency and maximizing overall throughput typically take priority.
Special-purpose hardware accelerators are often integrated within ASIC cores in SoC designs. A primary objective in developing an ASIC core is to make it as usable as possible across a broad range of applications and technologies. For ASIC cores that incorporate hardware accelerators, however, achieving this objective becomes complex due to the often-conflicting design-tradeoff priorities across applications as outlined above. One approach to facilitate broad usage for such cores is to make their designs configurable. For example, in designing cores that integrate configurable hardware accelerators that perform the same calculations repeatedly on each data input to generate the corresponding data output (such as cryptographic engines), it is feasible to define configuration options that can be invoked at hardware compile time to optimize design characteristics such as physical size, performance, latency, and overall throughput based on particular application requirements. Accordingly, it is desirable to implement a reusable control structure within a core integrating a configurable hardware acceleration engine that is able to provide for the available acceleration engine configuration options to be leveraged in such a flexible and optimized manner.