Increasing demand for high definition TV products, including interactive TV in a HD format and HD video compression encoding and decoding, requires increasing sophistication, flexibility, and performance in the supporting electronics. The sophistication, flexibility, and performance requirements for HD TV exceeds the capabilities of current generations of processor architectures by, in many cases, orders of magnitude.
The demands of video encoding for HD formats are both memory and data processing intensive, requiring efficient and high bandwidth memory organizations coupled with compute intensive capabilities. In addition, a video encoding product must be capable of supporting multiple standards each of which includes multiple optional features which can be supported to improve image quality and further reductions in compression bandwidth. Due to these multiple demands, a flexible parallel processing approach must be found to meet the demands in a cost effective manner.
A number of algorithmic capabilities are generally common between multiple video encoding standards, such as MPEG-2, H.264, and SMPTE-VC-1. Motion estimation/compensation and deblocking filtering are two examples of general algorithms that are required for video encoding. To efficiently support motion estimation algorithms and other complex programmable functions which may vary in requirements across the multiple standards, a processor by itself would require significant parallelism and very high clock rates to meet the requirements. A processor of this capability would be difficult to develop in a cost effective manner for commercial products.
An array processor typically requires short pipelines to minimize the complexity of having a large number of processor elements on a single chip. The short pipelines will typically have a minimum number of execution stages, such as a single execution stage or two to four execution stages, since each pipeline stage adds complexity to the processor element and the array processor. As a consequence, simple execution functions are typically defined in the array processor instruction set architecture.
In addition to pipeline control, there are other complexities in an array processor. For example, to meet performance requirements the array processor may need to have a large number of processor elements on a single chip. A large number of processor elements typically limits the operational clock rate due to chip size and wire length constraints. Even when more complex instruction execution functions are defined, such as adding a two-cycle execution function instead of a single cycle execution function, the complex instructions are defined within the constraint of the processor architecture. The more complex functions will typically utilize architectural features in the same manner as the simple execution functions. For example, the fetching of source operands for the more complex function will be accomplished in the same manner as the simpler functions. In a reduced instruction set computer (RISC) processor, the source operands are provided from a central register file and this access method will be used by the more complex function to maintain the programming model for the new instructions added. For memory intensive functions and functions of greater complexity, these standard approaches are inadequate.