1. Field of the Invention
The invention relates generally to digital video processing, and particularly to fetching reference pixel data during reconstruction of a compressed video bit stream.
2. Description of Related Art
Typically, non-compressed video and audio data are too large for storage and network communications. Modern video compression methods utilize several techniques to achieve compression ratios of hundreds to one. MPEG (Moving Picture Experts Group), a committee working under the joint direction of the International Standards Organization (ISO) and the International Electro-Technical Commission (IEC), has developed multiple standards for encoding video and accompanying audio data. Over the years, MPEG standards have progressed through several levels with increasing sophistication and quality.
Video compression relies upon a human eye's inability to resolve high frequency color changes and the large amount of redundancy within and between pictures in a video. MPEG achieves high compression rates by compressing the pictures in a time dimension, and encoding only changes from one picture to another instead of each entire picture of a series of pictures in a video. These techniques for using past and/or future pictures to compensate for part of a current picture in a compressed video is called motion compensation.
For purposes of motion compensation, MPEG, typically, defines three main types of pictures: intra-coded, predictive-coded and bi-directional predictive coded. Intra-coded pictures (I-picture) are coded without reference to other pictures and with only moderate compression. A predictive-coded picture (P-picture) is coded more efficiently using motion compensated prediction from a past intra- or predictive-coded picture, and is generally used as a reference for further prediction. Finally, a bi-directionally predictive coded picture (B-picture) provides the highest degree of compression, but requires use of both past and future reference pictures for motion compensation.
Typically, a compressed MPEG video includes groups of I-pictures, B-pictures and P-pictures. Each group of I-pictures, B-pictures and F-pictures is known as a group of pictures (GOP). FIG. 1 is an exemplary illustration of a GOP having an I-picture 102, two P-pictures 104 and 106, and five B-pictures 108, 110, 112, 114 and 116, and is illustrative of a conventional relationship among the three different picture types. The I-picture 102 includes full picture information, and has relatively the least amount of compression. The P-picture 104 is predicted from the I-picture 102, while the P-picture 106 is predicted from the P-picture 104. Subsequently, the B-picture 108 uses the past I-picture 102 and the future P-picture 104 as references, and the B-picture 112 uses the past P-picture 104 and the future P-picture 106 as references.
When a picture, such as the I-picture 102, is coded, the picture is first divided into a plurality of non-overlapping macroblocks. Typically, each of the macroblocks corresponds to a 16×16 pixel area in the picture. If the picture is represented by three color planes (i.e., a red plane, a green plane and a blue plane), RGB data in each macroblock is converted into a set of Y, Cr and Cb data. The Y or luminance data quantifies the overall brightness of the pixels in the macroblock, and is derived by totaling together all three of the RGB data. The Cr and Cb data are color difference data.
Conventionally, there are three chrominance formats for a macroblock, namely 4:2:0, 4:2:2 and 4:4:4. When the 4:2:0 format is used, a macroblock includes four 8×8 Y blocks, one 8×8 Cr block and one 8×8 Cb block. For each 8×8 block, the Discrete Cosine Transform (DCT) is used, along with other encoding procedures including quantization and variable length coding (VLC). A macroblock thus coded is called an intra-coded macroblock.
A P-picture, such as P-picture 104, is encoded by reusing part of the data contained in the previous I-picture 102. Each macroblock in the uncompressed P-picture 104, called a “target block”, is compared to areas of similar size from the uncompressed I-picture 102 in order to find an area or a “matching block” that is similar. Sometimes, the matching block happens to be in the same location in the past frame as the target block is in the current frame, and there is no difference (or the difference is negligible) between the target block and the matching block. In this situation, the target block may not be coded at all and is labeled a “skipped macroblock”. More often, the matching block is in a different location and/or there is some difference between the target block and the matching block. In this situation, only the difference between the target block and the matching block is encoded. Further, a motion vector, which indicates the relative difference in location between the target block and the matching block, is constructed and encoded in place of the data shared by the target block and the matching block. Because many fewer bits are required to code the motion vector than to code the video data shared by the target block and the matching block, compression is achieved.
A B-picture is coded by reusing data from both a past picture and a future picture. Thus, a macroblock of a B-picture may use matching macroblocks from both a past and future reference picture. Because information not found in the past picture may be found in the future picture, bi-directional motion compensation is much more effective than compression that uses only a single past picture. Further, bi-directional motion compensation allows more macroblocks to be replaced by motion vectors. A macroblock coded by referencing data in past and/or future pictures is called a “non-intra-coded” or “inter-coded” macroblock.
However, if no matching block for a macroblock in an uncompressed P-picture or B-picture can be found in the reference pictures, the macroblock cannot be motion compensated and will be coded as an intra-coded macroblock.
An MPEG compressed video bit stream (VBS) must be decoded before display. The I-pictures in the VBS can be decoded without reference to any other pictures in the VBS. However, a B-picture or P-picture in the VBS can only be reconstructed by using data from relevant parts of past and/or future pictures. Because a B-coded macroblock may contain motion vectors pointing to matching blocks in both a past I-picture or P-picture and a future I-picture or P-picture, these past and future I-pictures or P-pictures have to be decoded and stored before the B-coded macroblock is decoded. This decoding process, typically, results in transmission of pictures in a video bit stream in a different order from which the pictures will be displayed.
A conventional MPEG compliant decoder will write decoded pictures into a buffer memory, so that pixel data in reference pictures may be available to the MPEG decoder for motion compensation. For example, when a P-coded 16×16 macroblock is being decoded, one matching block in a previous I-picture or P-picture, as referenced by the motion vector associated with the P-coded macroblock, may be fetched from the buffer memory and be used to reconstruct the P-coded macroblock.
In a typical, real-time video decoding system, the buffer memory and associated memory bus are shared by several peripherals (e.g., MPEG video or audio decoder, audio input and video input, etc.). All of these peripherals have real-time constraints (i.e., each of the peripherals requires a certain minimum amount of memory bandwidth to work correctly). If the required bandwidth is not available, a failure may occur, such as a missed video frame or an audio “pop”.
In order to reduce overhead associated with the transfer of data to and/or from the buffer memory and to make more efficient use of the memory bus, video and audio data are, conventionally, transferred between the buffer memory and the peripherals in data packets. One way to guarantee bandwidth to a collection of peripherals is to use time-domain multiplexing in order to time-slice the memory bus. When time-domain multiplexing is used, each peripheral is allowed to transfer a fixed amount of data packets to and/or from the buffer memory during a certain time period. The amount of data in a data packet is usually fixed, and there is a requirement that each of the data packets from the buffer memory may only include data from a single memory page in the buffer memory.
Reference pixel data corresponding to a matching block may come from random places in a picture and may fall across multiple memory pages. Within each memory page, the required reference pixel data usually do not fill an integral number of data packets. For example, if each data packet holds 16 bytes of data and there are 18 bytes of required pixel data within one memory page, two packets that are capable of holding 32 bytes of data are used to carry the 18 bytes of required pixel data from this memory page. This inefficient use of data packets places more requirements on memory bandwidth.
Therefore, there is a need for an apparatus and method for improving memory bandwidth efficiency during a real-time video decoding process.