The electronics industry has become increasingly driven to meet the demands of high-volume consumer applications, which comprise a majority of the embedded systems market. Examples of consumer applications where embedded systems are employed include handheld devices, such as cell phones, personal digital assistants (PDAs), global positioning system (GPS) receivers, digital cameras, etc. By their nature, these devices are required to be small, low-power, light-weight, and feature-rich. Thus, embedded systems face challenges in producing performance with minimal delay, minimal power consumption, and at minimal cost. As the numbers and types of consumer applications where embedded systems are employed increases, these challenges become even more pressing.
Each of these applications typically comprises a plurality of algorithms which perform the specific function for a particular application. An algorithm typically includes multiple smaller elements called algorithmic elements which when performed produce a work product. An example of an algorithm is the QCELP (QUALCOMM code excited linear prediction) voice compression/decompression algorithm which is used in cell phones to compress and decompress voice in order to save wireless spectrum.
Conventional systems in hardware architectures provide a specific hardware accelerator typically for one or two algorithmic elements. This has typically sufficed in the past since most hardware acceleration has been performed in the realm of infrastructure base stations. There, many channels are processed (typically 64 or more) and having one or two hardware accelerations, which help accelerate the two algorithmic elements, can be justified. Best current practices are to place a Digital Signal Processing IC alongside the specific hardware acceleration circuitry and then arraying many of these together in order to process the workload. Since any gain in performance or power dissipation is multiplied by the number of channels (64) this approach is currently favored.
For example, in a base station implementation of the QCELP algorithm acceleration the pitch computation will result in a 20% performance/power savings per channel. 20% of the processing which is done across 64 channels results in a significantly large performance/power savings.
The shortcomings with this approach are revealed when attempts are made to accelerate an algorithmic element in a mobile terminal. There typically is only a single channel is processed and for significant performance and power saving to be realized then many algorithmic elements must be accelerated. The problem, however, is that the size of the silicon is bounded by cost constraints and a designer can not justify added specific acceleration circuitry for every algorithm element. However, the QCELP algorithm itself consists of many individual algorithm elements (17 of the most frequently used algorithmic elements):                1. Pitch Search Recursive Convolution        2. Pitch Search Autocorrelation of Exx        3. Pitch Search Correlation of Exy        4. Pitch Search Autocorrelation of Eyy        5. Pitch Search Pitch Lag and Minimum Error        6. Pitch Search Sinc Interpolation of Exy        7. Pitch Search Interpolation of Eyy        8. Codebook Search Recursive convolution        9. Codebook Search Autocorrelation of Eyy        10. Codebook Search Correlation of Exy        11. Codebook Search Codebook index and Minimum Error        12. Pole Filter        13. Zero Filter        14. Pole 1 Tap Filter        15. Cosine        16. Line Spectral Pair Zero search        17. Divider        
For example, in a mobile terminal implementation of the QCELP algorithm, if the pitch computation is accelerated, the performance/power dissipation is reduced by 20% for an increased cost of silicon area. By itself, the gain for the cost is not economically justifiable. However, if for the cost in silicon area of a single accelerator there was an IC that can adapt itself in time to be able to become the accelerator for each of the 17 algorithmic elements, it would cost 80% of the cost for a single adaptable accelerator.
Normal design approaches for embedded systems tend to fall in one of three categories: an ASIC (application specific integrated circuit) approach; a microprocessor/DSP (digital signal processor) approach; and an FPGA (field programmable gate array) approach. Unfortunately, each of these approaches has drawbacks. In the ASIC approach, the design tools have limited ability to describe the algorithm of the system. Also, the hardware is fixed, and the algorithms are frozen in hardware. For the microprocessor/DSP approach, the general-purpose hardware is fixed and inefficient. The algorithms may be changed, but they have to be artificially partitioned and constrained to match the hardware. With the FPGA approach, use of the same design tools as for the ASIC approach result in the same problem of limited ability to describe the algorithm. Further, FPGAs consume significant power and are too difficult to reconfigure to meet the changes of product requirements as future generations are produced.
An alternative is to attempt to overcome the disadvantages of each of these approaches while utilizing their advantages. Accordingly, what is desired is a system in which more efficient consumer applications can be created and programmed than when utilizing conventional approaches.