A. Field of the Invention
The present invention relates generally to an integrated, loosely-pipelined video codec, and more particularly to a multi-bus architecture within the video codec architecture that improves encoding performance and power consumption.
B. Background of the Invention
The importance of digital video technology is well understood by one of skill in the art. Over the past few years, the digital video market has exploded in response to improvements in video compression and applications that allow a user to record, manipulate, store and transmit digital video over a network. The ability to transmit and display high-quality digital video has significantly improved as compression techniques have evolved. Additionally, the video market has seen a meaningful reduction in the size and power consumption of video devices that record, transmit, receive and display digital video content.
The representation of video within a digital framework requires significantly large amounts of binary data to be generated, transmitted, stored and processed. Video compression reduces this amount of data by using both spatial image and temporal motion compression techniques within a digital video stream. Numerous compression algorithms have been developed that compress and encode digital video data in both the spatial and temporal domains. One example is the H.264 standard that defines processes and parameters in which digital video may be encoded and decoded.
FIG. 1 generally illustrates a video encoding architecture that may be used in compliance within the H.264 standard. Video frames are received from an external source and divided into video component macroblocks including both luma and chroma blocks. These macroblocks are processed to determine a preferred encoding or prediction mode. The identification of a prediction mode for a macroblock represents a time consuming, computationally intensive process in which a diverse set of pixel data is processed, manipulated, fetched from and stored in memory. Additionally, the delivery of this pixel data to a processing device within the codec oftentimes requires formatting procedures, such as deserialization or demultiplexing of an incoming data signal, so that processing elements can properly operate on the data. To further complicate the procedure, the identification of an appropriate prediction mode is extremely time sensitive in that the mode must be determined within a very limited time window. All of these factors typically result in a design that sacrifices the quality of compression in order to meet the timing restraints, power consumption criteria, or footprint requirements of a video codec.
During inter mode prediction, a current macroblock is provided to a motion estimation module 170 and reference frames, temporally located from the current macroblock, are fetched from the memory store 190. The motion estimation module 170 iteratively analyzes a plurality of reference blocks relative to the current macroblock to identify an appropriate motion vector. If such an appropriate motion vector is identified, then an inter prediction module 160 may finely adjust the motion vector by performing half and/or quarter pel operations on the temporally located reference block. These half and quarter pel operations can be extremely computationally intensive due to the very large number of arithmetic operations performed as well as the number of read and write operations performed in memory.
During intra mode prediction, the intra prediction module 150 analyzes a macroblock within a frame relative to spatially located reference blocks within the same frame. This analysis attempts to identify a reference block and corresponding intra prediction mode for the macroblock.
This prediction analysis requires that a residual be generated and compressed for each reference block under test. This residual represents a difference between the current macroblock and the reference block, which residual is provided to a direct integer transformation module 110. The residual is transformed using an integer transformation into a set of spatial frequency coefficients. This transformation is analogous to a transformation from a time domain signal into a frequency domain signal.
The frequency coefficients are provided to a scaling & quantization module 120 which then generates a quantized and scaled signal. In effect, the quantization process divides the frequency coefficients by an integer scaling factor, thereafter truncating the signal. This process usually introduces a modification in the compressed block that requires compensation when the block is later regenerated.
The amount of error introduced into the video signal by the encoding process may be determined by reconstructing the encoded frame. Reconstruction occurs by a dequantization & descaling module and inverse integer transformation module 140, which reverse quantizes the video signal resulting in a rescaled signal. This rescaled signal is then inversely transformed to produce a reconstructed macroblock.
This reconstructed macroblock may be compared to the original macroblock to identify the error introduced by the compression process. As a result, the effectiveness of the different prediction modes may be compared to identify a preferred mode for a particular block. Once the preferred mode has been identified, an entropy coder 130 encodes the macroblock for transmission.
As digital video continually becomes more relevant in today's society and the digital video markets exponentially expand, the importance of optimizing the compression and encoding of digital video is apparent. One significant factor in this optimization is the need to reduce computational latency. This is especially important for real-time video applications including video conferencing, security and monitoring, interactive gaming and others. Another significant factor in this optimization is the ability to more efficiently manage the transportation of diverse sets of data within video codec architectures.