Digital video is generally processed in sets of video frames. Each frame is a still image representing an instant in time of the video being processed. Each frame can further be broken down into blocks. The blocks are individually transmitted and then recombined to form a frame. The amount of data needed to represent the image blocks can become large. Motion compensation can be used to reduce the amount of data needed to represent the image blocks.
Using motion compensation, image blocks can be represented by motion compensation vectors and error data. Motion compensation vectors are used on prediction frames. For example, an object in one frame may simply be displaced either partially or fully into in a new frame. Accordingly, the image blocks used to represent the object in the new frame may be processed with motion vectors, using the image blocks in the original frame as reference. The motion vectors provide the direction and distance in which the referenced image blocks have moved to in the new, or predicted, frame. While the motion vectors may track an object, the temporal compression achieved by motion compensation is intended to reduce the bits required to reproduce the error term, and as such need not necessarily track a specific object.
In some cases, motion compensation vectors are all that are needed to reproduce an image block. However, in many situations, some error exists between the referenced image blocks and the blocks in the predicted frame. Error data can be sent to recover the differences and adequately generate the image block. The error data itself is basic image information, including the luminance of the pixels within the image block. A transform, such as a discrete cosine transform (DCT), can be used to reduce the size of the error data to a transformed data set. The transformed data set includes transfer coefficients, which can then be inverse transformed to reproduce the error data. In some cases, no motion vectors can be generated for a given image block. For example, when a video switches to a new scene none of the objects in the new frame can be referenced to objects in the previous frame. In such a case, the image block is represented only with error data. Furthermore, some reference frames for motion compensation are made up of image blocks represented with only error data. They are referred to as intra-frames, or I-frames. Prediction frames, or P-frames, are motion compensated frames that use previous I- or P-frames for reference. Bi-directional frames can use previous or upcoming I- or P-frames for reference. It should be noted that B-frames are never used as reference themselves to avoid the accumulation of precision errors.
To process the frame data, conventional video processing hardware is used to capture the motion compensation vector data and the error data. The transformed data sets are inverse transformed, such as through an inverse discrete cosine transform (IDCT) component, to accurately reproduce the error data. In some cases, very little or no motion compensation vector data may be present for a given block and most of the data will be related to error data. The hardware must wait for the error data to be fully processed before it can process or receive more motion compensation vector data. The hardware pipeline becomes stalled as it waits for the error data to be processed. In other cases, when reconstruction of an image frame involves mostly motion compensation vector data and few IDCT operations, the IDCT component may become stalled as it waits for the hardware pipeline to process the motion compensation vector data.
Conventional systems force the hardware to be idle when the workloads between the IDCT operations and the motion compensation operations are not well balanced. Stalling the hardware reduces the efficiency with which frames of video are processed and increases the delay in which an image frame can be displayed.
In addition, there are many prediction modes in MPEG video coding and the prediction mode being applied changes from macroblock to macroblock. Graphics processing devices such as graphics chips are known which employ, for example, three separate hardware section in a 3D pipeline. They include a 3D portion, a 2D portion and a separate motion compensation hardware portion to perform the numerous transforms such as IDCT operations as well as a motion compensation prediction. Such dedicated IDCT and motion compensation prediction hardware is typically dedicated hardware, which can process, for example, MPEG II data. However, with different encoding schemes, such implementations may not effectively process encoded video. For example, with MPEG IV, such dedicated MPEG II hardware may not sufficiently decode the encoded video and the additional hardware can increase the cost of the graphics chip. Such previous methods incorporated dedicated hardware within the 3D pipe. It would be desirable to eliminate the need for such hardware.
Graphics chips are also known that use programmable shaders, such as programmable vertex shaders and programmable pixel shaders. The programmable shaders facilitate, for example, shading and other operations for 3D rendering of images based on primitives such as triangles or other objects. Such 3D rendering employs texture maps, as known in the art to apply, for example, textures to surfaces of 3D objects. However, it would be desirable to provide a more efficient processing structure to reduce costs and improve decoding of encoded video.