The MPEG standards specify a lossy type compression scheme that is adapted to handle a variety of audio/video formats. MPEG-1 and MPEG-2 employ frame-based coding standards that are beneficial for primarily single-media video applications. For example, MPEG-2 (i.e., MPEG Version 2) supports standard television signals, high definition television (HDTV) signals, and five channel surround sound. Similarly, MPEG-2 also provides a broadcast-quality image at 720×480 pixel resolution for use in digital video disk (DVD) movies.
The latest video coding standard, MPEG-4, supports object-based compression/decompression that is beneficial for multimedia applications, especially combining natural video and synthetic graphics objects. MPEG-4 is capable of relatively high compression ratios and is a powerful tool useful for a wide range of applications, including Internet browsing, set-top boxes, video games, video conferencing, and wireless networks. Also, the MPEG-4 standard is capable of handling arbitrary-shaped objects that cannot be accommodated by the frame-based coding standards of both MPEG-1 and MPEG-2.
Widespread use of MPEG-4 for desktop video is expected, but MPEG-4 acceleration is not widely incorporated into many graphics coprocessors. Fortunately, the techniques used in MPEG-4 video decoding for rectangular video objects is similar to those used in MPEG-2. Thus, MPEG-4 video decoding can be accelerated in a similar way on existing graphics coprocessors, such as Nvidia Corporation's GEFORCE™ graphics coprocessor, S3 Graphic, Inc.'s SAVAGE™ line of graphics coprocessors, ATI Technology, Inc.'s RAGE™ line of graphics coprocessors, and Rendition Corporation's VERITE™ series of graphics coprocessors.
These graphics coprocessors typically accelerate MPEG-½ decoding with separate on-chip fixed function units or programmable/configurable graphics pipelines. These pipelines often perform the last few steps of MPEG-½ decoding rather than requiring the host processor to perform these steps. Examples of off-loaded tasks performed by the graphics coprocessors include motion compensation and inverse discrete cosine transformation (IDCT). The last few steps of MPEG-½ decoding has a one-way data flow (from the host processor to the coprocessor), thus avoiding the need for synchronization between the host processor and the coprocessor.
However, if the host processor needs to post-process the resulting macroblocks after an IDCT and motion compensation, the coprocessor must notify the host processor when the IDCT or motion compensation is completed and the data are ready to be transferred back to the host processor. This process has to be repeated in transferring the post-processed data back to the coprocessor's video memory.
For example, the padding of boundary macroblocks has to be done on texture data after IDCT and motion compensation are completed. For padding to be performed by the host processor, macroblocks have to be transferred from video memory to host memory for padding and then back to video memory for use as a reference in decoding subsequent frames. Unfortunately, boundary macroblock padding, which is one of the key processing steps in decoding arbitrary-shaped video objects in MPEG-4, cannot be efficiently accelerated on typical graphics coprocessors. Unless special hardware and/or a specific set of new instructions are added to the graphics coprocessor, it is typically better for boundary macroblock padding to be performed on the host processor.
Rather than incurring the processing load on the host processor, and taking the time to pass data back and forth between host memory and video memory, it would clearly be desirable to accelerate MPEG-4 video decoding on the same processing hardware used for MPEG-2 video decoding by implementing boundary macroblock padding more efficiently. Accordingly, it would be preferable to develop a solution that can be readily implemented with existing hardware and without significant synchronization overhead.