Previous and current video processing techniques have only been partially successful when applied to current video processing algorithms because of significant control and addressing overhead, and high clock rate and power consumption requirements. These limitations resulted because the architectures used were designed to operate on data objects different from those that are typical in current video processing algorithms. Examples of such video processing architectures include pure vector, array, VLIW (Very Long Instruction Word), DSP (Digital Signal Processing), and general purpose processors with micro-SIMD (single-instruction multiple-data) extensions.
A parallel single-instruction multiple-data (SIMD) array architecture, having a two-dimensional rectangular array of processing elements (PEs) operating on its own set of data, is an architecture used for high-performance video processing applications. Programmable array architectures, using processing elements with varied complexity, are often referred to as being memory-oriented. The processing elements in such memory-oriented architectures operate on video streams from memory, or adjacent processing elements, and the results are written back to memory. While peak processing capabilities of such programmable array architectures can be quite high, their poor reuse of data leads to intensive memory traffic. As a result, performance suffers due to limited memory bandwidth available in such systems. This significantly limits the video standard complexity, frame rate, and size achievable with such programmable array architectures.
For extremely high-performance mobile applications requiring relatively low clock rate and power consumption compared to general purpose processors with micro-SIMD extensions, hard-wired array implementations of video algorithms are found to be efficient. Such application specific integrated circuits (ASICs) can reach high performance and low power consumption by providing a set of specialized units and interconnect structure tuned for video algorithm and data characteristics. ASICs efficiently reuse data fetched from memory into PEs, data created by PEs via the use of delay/buffer registers holding intermediate results, or data already fetched from memory, thus significantly decreasing the memory traffic. Unfortunately, ASICs have very limited to no programmability, and high development and verification costs. With costs of such ASICs currently reaching 60% or more of the cost of consumer video products, it is desirable to develop a solution that combines advantages of programmable SIMD array architectures with the efficiency and performance of video ASICs.