This invention relates to encoding and decoding of video signals. More particularly, this invention relates to encoding and decoding of video signals from very low to high bitrates.
Bidirectionally predicted pictures (B-pictures) were adopted for the International Standards Organization (ISO) Moving Picture Experts Group-Phase 1 (MPEG-1) video standard, which was optimized for coding of video signals of Source Intermediate Format (SIF: 352xc3x97240@30 frames/s or 352xc3x97288@25 frames/s) at bitrates of up to about 1.5 Mbit/s. For the next phase of ISO MPEG, the MPEG-2 video standard, optimized for coding of CCIR-601 4:2:0 (active portion: 704xc3x97480@30 interlaced frames/s or 704xc3x97576@25 interlaced frames/s) at bit rates of 4 to 9 Mbits/s, B-pictures were again proven to provide high coding efficiency. Furthermore, in MPEG-2, the B-pictures were also adapted to achieve temporally scalable (layered) video coding, which is used for temporal scalability from interlace to high temporal resolution progressive video and compatible coding of stereoscopic video.
In addition to the ISO MPEG standards, the International Telecommunication Union-Transmission Sector (ITU-T) provides the H.263 standard. The H.263 standard is optimized for coding of Quarter Common Intermediate format (QCIF: 176xc3x97144@30 frames/s or lower) video at very low bitrates of 20 to 30 kbits/s and includes a very low overhead (and a lower quality) version of B-pictures, called the PB-frame mode. Since the ITU-T H.263 standard deals with coding at lower bitrates of simple (e.g., video phone and video conferencing) scenes, the PB-frame mode was basically employed to double the frame-rate when higher temporal resolution was needed. The quality limitation of PB-frames was not considered to be a major impediment since it was the only efficient method to provide higher frame-rates. Furthermore, soon after completion of H.263, the ITU-T Low Bitrate Coding group started an effort to incorporate optional enhancements to H.263, which when combined with H.263 were expected to result in H.263+ standard. The work on these optional enhancements is being performed in parallel to the ongoing work in ISO on its next phase standard called MPEG-4.
The MPEG-4 standard is being optimized for coding of a number of formats, including QCIF, CIF, and SIF, at bitrates ranging from that employed for H.263 to that employed for MPEG-1, i.e., from about 20 kbits/s to about 1.5 Mbits/s. However, in MPEG-4, besides coding efficiency, the focus is on functionalities. Although MPEG-2 also provide some functionalities such as interactivity with stored bitstream (also provided in MPEG-1), scalability and error resilience, the bitrates used in MPEG-2 are much higher and its functionalities are rather limited. The goal of MPEG-4 is to allow a much higher degree of interactivity, in particular, interactivity with individual video objects in a stored bitstream, scalability, in particular, spatial and temporal scalability of individual objects, higher error resilience, and efficient coding of multiviewpoint video, all at bitrates ranging from very low to high. Further, it is anticipated that MPEG-4""s current scope will be extended to include coding of interlaced video of Half Horizontal Resolution (HHR) and CCIR-601 optimized at higher bitrates (e.g., 2 to 6 Mbits/s) than those currently used. The video coding optimization work in MPEG-4 is being accomplished by iterative refinement of Verification Models (VMs) that describe the encoding schemes.
Efficient coding of digital video is achieved in accordance with this invention, by integrating the bidirectional prediction modes of the MPEG-1 and the H.263 standards into a single adaptive scheme, while eliminating the restrictions and limitations imposed in these standards. This results in an efficient yet flexible method for performing the bidirectionally predictive coding of pictures (improved B-pictures) that is capable of efficiently operating with good performance over a wider range of bitrates than that possible by equivalent techniques in the individual MPEG-1 and H.263 standards. The present invention is thus suitable for B-picture coding of the H.263+ standard. Furthermore, the inventive method can be applied to the bidirectionally predictive coding of either rectangular regions or arbitrary shaped objects/regions in video pictures (so-called B-VOPs) for MPEG-4. The remaining portions of the regions are performed in accordance with the MPEG-1 or H.263 standard. That is, the motion compensated discrete cosine transform (xe2x80x9cDCTxe2x80x9d ) coding framework employed in existing standards such as MPEG-1, MPEG-2, and H.263 video standard is used, with appropriate extensions, to provide an efficient, flexible coding scheme.
Known encoding techniques are either effective at rates of 1 Mbit/s or higher (as in the case of B-pictures in MPEG-1/MPEG-2) or compromise quality if low bitrates are employed, (as in the case of PB-frames of the H.263 standard), or alternatively, are intended only on pictures (rectangular VOPs). In contrast, the inventive method allows effective operation over a wider range of bitrates and does not compromise quality anywhere within its operating range and is easily extensible to the encoding of arbitrary shaped objects in frames (VOPs or Video Object Planes). Moreover, to ensure high coding efficiency and quality, the prediction modes of the invention are combined with various types of overhead typically employed when coding blocks of pixels arranged as macroblocks. As a result, an optimized low-overhead coding syntax is provided that allows meaningful mode combinations. Thus, when coding pictures or rectangular VOPs the improved B-pictures of the invention provides compatibility with the remainder of the coding scheme by simply replacing the existing B-pictures with the improved B-pictures.
In one particular embodiment of the invention, a method is provided for decoding a bit stream representing an image that has been encoded The method includes the steps of: performing an entropy decoding of the bit stream to form a plurality of transform coefficents and a plurality of motion vectors; performing an inverse transformation on the plurality of transform coefficients to form a plurality of error blocks; determining a plurality of predicted blocks based on bidirectional motion estimation that employs the motion vectors, wherein the bidirectional motion estimation includes a direct prediction mode and a second prediction mode; and, adding the plurality of error blocks to the plurality of predicted blocks to form the image. The second prediction mode may include forward, backward, and interpolated prediction modes.