1. Field of the Invention
This invention relates generally to digital video processing, and particularly to the process of using motion compensation to reconstruct a compressed video bit stream.
2. Background of the Invention
Video and audio data, if not compressed, are usually too large for storage and network communications. Modem video compression mixes several techniques to achieve compression ratios of hundreds to one. MPEG (which stands for the Moving Picture Experts Group) is a committee working under the joint direction of the International Standards Organization (ISO) and the International Electro-Technical Commission (IEC) and has developed multiple standards for encoding video and accompanying audio data. Over the years MPEG standards have progressed through several levels with increasing sophistication and quality.
Video compression relies upon the human eye""s inability to resolve high frequency color changes, and the fact that there is a lot of redundancy within each picture and between pictures in the video. MPEG achieves high compression rate by compressing the pictures in the time dimension, and encoding only the changes from one picture to another, instead of each entire picture of a series of pictures in a video. The techniques of using past and/or future pictures to compensate for part of a current picture in a compressed video is called motion compensation.
For purposes of motion compensation, MPEG typically defines three main types of pictures, which are xe2x80x9cintra coded,xe2x80x9d xe2x80x9cpredictive codedxe2x80x9d and xe2x80x9cbi-directional predictive codedxe2x80x9d pictures. Intra coded pictures (or xe2x80x9cI-picturesxe2x80x9d) are coded without reference to other pictures and with only moderate compression. A predictive coded picture (or xe2x80x9cP-picturexe2x80x9d) is coded more efficiently using motion compensated prediction from a past intra or predictive coded picture, and is generally used as a reference for further prediction. Bi-directionally-predictive coded pictures (xe2x80x9cB-picturesxe2x80x9d) provide the highest degree of compression but require use of both past and future reference pictures for motion compensation.
A compressed MPEG video typically includes groups of I-pictures, B-pictures and P-pictures. Each group of I-pictures, B-pictures and P-pictures is known as a Group of Pictures (GOP). FIG. 1 is a diagram of an example of such a GOP comprising an I-picture 110, two P-pictures 120 and 121, and five B-pictures 130, 131, 132, 133 and 134, and the relationship among the three different picture types as conventionally known. The I-picture 110 includes full picture information and has relatively the least amount of compression. The P-picture 120 is predicted from the I-picture 110, and the P-picture 121 is predicted from the P-picture 120. The B-picture 130 uses the past I-picture 110 and the future P-picture 120 as references, and the B-picture 132 uses the past P-picture 120 and the future P-picture 121 as references.
When a picture is to be coded such as an I-picture, the picture is first divided into a plurality of non-overlapping macroblocks. Typically, each of the macroblocks corresponds to a 16xc3x9716 pixel area in the picture. If the picture is represented by three color planes, a red plane, a green plane and a blue plane, the RGB data in each macroblock is converted into a set of Y, Cr and Cb data. The Y or luminance data quantifies the overall brightness of the pixels in the macroblock, and is derived by totaling together all three of the RGB data. The Cr and Cb data are color difference data.
There are typically three chrominance formats for a macroblock, namely 4:2:0, 4:2:2 and 4:4:4. When the 4:2:0 format is used, a macroblock includes four 8xc3x978 Y blocks, one 8xc3x978 Cr block and one 8xc3x978 Cb block. For each 8xc3x978 block, the Discrete Cosine Transform (DCT) is used, along with other encoding procedures including quantization and variable length coding (VLC). A macroblock thus coded is called an intra coded macroblock.
A P-picture, such as the P-picture 120 in FIG. 1, is encoded by reusing part of the data contained in the previous I-picture 110. Each macroblock in the uncompressed P-picture 120, called a xe2x80x9ctarget block,xe2x80x9d is compared to areas of a similar size from the uncompressed I-picture 110 in order to find an area or a xe2x80x9cmatching blockxe2x80x9d that is similar. Sometimes, the matching block happens to be in the same location in the past frame as the target block is in the current frame, and there is no difference (or the difference is negligible) between the target block and the matching block. In this situation, the target block may not be coded at all and is called a skipped macroblock. More often, the matching block is in a different location and/or there is some difference between the target block and the matching block. In this situation, only the difference between the target block and the matching block is encoded, and a motion vector, which indicates the relative difference in location between the target block and the matching block, is constructed and encoded in place of the data shared by the target block and the matching block. Because much less bits are required to code the motion vector than to code the video data shared by the target block and the matching block, compression is achieved.
A B-picture is coded by reusing data from both a past picture and a future picture. A macroblock of a B-picture may use matching macroblocks from both a past reference picture and a future reference picture. Because information that is not to be found in the past picture might be found in the future picture, bi-directional motion compensation is much more effective than compression that uses only a single past picture, and allows more macroblocks to be replaced by motion vectors. A macroblock coded by referencing data in past and/or future pictures is called a non-intra coded or inter coded macroblock.
If no matching block for a macroblock in a uncompressed P-picture or B-picture can be found in the reference pictures, the macroblock can not be motion compensated and will be coded as an intra coded macroblock.
An MPEG compressed video bit stream (VBS) needs to be decoded before it is ready for display. The I-pictures in the VBS can be decoded without reference to any of the other pictures in the VBS. However, a B-pictures or P-picture in the VBS can only be reconstructed by using data from the relevant parts of past and/or future pictures. Because a coded B-picture may contain motion vectors pointing to matching blocks in both a past I-picture or P-picture and a future I-picture or P-picture, these past and future I-picture or P-pictures have to be decoded and stored before the coded B-picture is decoded. Therefore, Bi-directional motion compensation requires that pictures in a video be transmitted in a different order from which they will be displayed.
Frame buffers are usually used by an MPEG compliant decoding process, to store decoded I-picture and/or P-picture, until all of the pictures depending on the I-picture and/or P-picture for motion compensation are reconstructed. For example, when a inter coded macroblock in a P-picture is being decoded, prediction data associated with a matching block in a decoded previous I-picture or P-picture, as pointed by the motion vector associated with the Inter-coded macroblock, will be fetched from a frame buffer, and be used to reconstruct the Inter-coded macroblock.
Traditionally, an MPEG decoded picture represented by three planes, a luminance (Y) plane and two chrominance (Cb and Cr) planes, is typically stored in planar mode in which pixel data within each plane are stored in raster scan order. However, motion compensation is operated on macroblock basis, so that data is read from or written to a frame buffer in blocks. As a result of this inconsistency, many page breaks will be encountered when data corresponding to a matching block are read from the frame buffer and when a decoded macroblock is written into the frame buffer. When the 4:2:0 format is used, the luminance plane of a picture is typically stored in a separate memory space from the chrominance planes. To store in planar mode a decoded CCIR 601 frame in 4:2:0 format having standard resolution of 720xc3x97480, assuming that each luminance data sample occupies 1 byte of memory in the frame buffer, and that the frame buffer is made of DRAMs with 2 kilo byte pages, the luminance data samples corresponding to roughly every three scan lines of pixels have to be stored in a separate page in the frame buffer. In this situation, as shown in FIG. 1B, luminance data samples corresponding to a 16xc3x9716 block 150 is typically split into 6 different memory pages, which are pages 160a-f of the frame buffer. Therefore, to reconstruct the luminance component of a 16xc3x9716 macroblock in a frame picture using motion compensation having one directional prediction, at least 10 page breaks will be encountered in performing the operations of reading a matching block from the frame buffer and writing the reconstructed data samples to a frame buffer. The delay involved in waiting for the memory to fetch a new page cause inefficiency in using the memory bandwidth and latency in transferring data to and from the frame buffers.
Thus, there is a need for an apparatus and method for improving memory bandwidth efficiency when MPEG motion compensation is performed, and that can overcome the above-mentioned deficiencies of conventional approaches.
The above needs are met by a method and system that map a decoded picture into memory addresses in a buffer memory using a set of address mapping methods called macroblock tiling format.
In one aspect of the present invention, the data samples representing a picture are grouped into a number of tiles. Each tile is stored in a single memory page in the buffer memory. Data samples in each tile may be luminance data samples corresponding to a given number of macroblocks, or chrominance data samples corresponding to a given number of macroblocks, or a combination of luminance and chrominance data samples corresponding to a given number of macroblocks.
In one embodiment of the present invention, an address generator generates memory addresses for fetching prediction data from the buffer memory and for writing a decoded macroblock into the buffer memory, based on the macroblock tile format address mapping methods.