Many new applications being planned for mobile devices (multimedia, graphics, image compression/decompression, etc.) involve a high percentage of streaming vector computations. The computation rate of these applications often exceeds that which the best general purpose CPU's can deliver. Therefore, it is desirable to find a means to improve the performance of the computation engine present in such devices to meet the computational requirements of these new applications.
Simultaneously, the nature of these new applications is that the standards and the best algorithms for complying with the standards are constantly changing, requiring a solution that is programmable and easy to program. Moreover, time to market pressures are increasing. One method for addressing this issue is to increase the reuse of previous investments in software and hardware. Reuse of hardware across multiple products is best promoted through programmability. Reuse of software is promoted through the use of a consistent programming model across multiple implementations of a device so that binary compatibility is maintained.
One attempt to satisfy this need is the use of hardware accelerators. These fall short of solving the problem because they have limited reprogramming capability. Those hardware accelerators that are not fixed in function allow only for changes in the parameters of the functions they execute rather than a change in the type or ordering of functions.
Programmable solutions exist in the form of vector processors, digital signal processors, SIMD processors and VLIW processors. These solutions fall short due to limitations in their programming models which cause them to be difficult to program and make it difficult to maintain a consistent programming model across all generations of hardware. Limitations in their programming models include: programmer visibility of the data-path pipeline, memory width and latency, data alignment in memory, and explicit resource dependencies.