The present invention provides a method and apparatus for coding of digital video images such as bi-directionally predicted video object planes (B-VOPs),in particular, where the B-VOP and/or a reference image used to code the B-VOP is interlaced coded.
The invention is particularly suitable for use with various multimedia applications, and is compatible with the MPEG-4 Verification Model (VM) 8.0 standard (MPEG-4 VM 8.0) described in document ISO/IEC/JTC1/SC29/WG11 N1796, entitled xe2x80x9cMPEG-4 Video Verification Model Version 8.01xe2x80x9d, Stockholm, July 1997, incorporated herein by reference. The MPEG-2 standard is a precursor to the MPEG-4 standard, and is described in document ISO/IEC 13818-2, entitled xe2x80x9cInformation Technologyxe2x80x94Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262,xe2x80x9d Mar. 25, 1994, incorporated herein by reference.
MPEG-4 is a coding standard which provides a flexible framework and an open set of coding tools for communication, access, and manipulation of digital audio-visual data. These tools support a wide range of features. The flexible framework of MPEG-4 supports various combinations of coding tools and their corresponding functaionalities for applications required by the computer, telecommunication, and entertainment (i.e., TV and film) industries, such as database browsing, information retrieval, and interactive communications.
MPEG-4 provides standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. MPEG-4 achieves efficient compression, object scalability, spatial and temporal scalability, and error resilience.
The MPEG-4 video VM coder/decoder (codec) is a block- and object-based hybrid coder with motion compensation. Texture is encoded with an 8xc3x978 Discrete Cosine Transformation (DCT) utilizing overlapped block-motion compensation. Object shapes are represented as alpha maps and encoded using a Content-based Arithmetic Encoding (CAE) algorithm or a modified DCT coder, both using temporal prediction. The coder can handle sprites as they are known from computer graphics. Other coding methods, such as wavelet and sprite coding, may also be used for special applications.
Motion compensated texture coding is a well known approach for video coding, and can be modeled as a three-stage process. The first stage is signal processing which includes motion estimation and compensation (ME/MC) and a two-dimensional (2-D) spatial transformation. The objective of ME/MC and the spatial transformation is to take advantage of temporal and spatial correlations in a video sequence to optimize the rate-distortion performance of quantization and entropy coding under a complexity constraint. The most common technique for ME/MC has been block matching, and the most common spatial transformation has been the DCT.
However, special concerns arise for ME/MC of macroblocks (MBs) in B-VOPs when the MB is itself interlaced coded and/or uses reference images which are interlaced coded.
In particular, it would be desirable to have an efficient technique for providing motion vector (MV) predictors for a MB in a B-VOP. It would also be desirable to have an efficient technique for direct mode coding of a field coded MB in a B-VOP. It would further be desirable to have a coding mode decision process for a MB in a field coded B-VOP for selecting the reference image which is results in the most efficient coding.
The present invention provides a system having the above and other advantages.
In accordance with the present invention, a method and apparatus are presented for coding of digital video images such as a current image (e.g., macroblock) in a bi-directionally predicted video object plane (B-VOP), in particular, where the current image and/or a reference image used to code the current image is interlaced (e.g., field) coded.
In a first aspect of the invention, a method provides direct mode motion vectors (MVs) for a current bi-directionally predicted, field coded image such as a macroblock (ME) having top and bottom fields, in a sequence of digital video images. A past field coded reference image having top and bottom fields, and a future field coded reference image having top and bottom fields are determined. The future image is predicted using the past image such that MVtop, a forward MV of the top field of the future image, references either the top or bottom field of said past image. The field which is referenced contains a best-match MB for a MB in the top field of the future image.
This MV is termed a xe2x80x9cforwardxe2x80x9d MV since, although it references a past image (e.g., backward in time), the prediction is from the past image to the future image, e.g., forward in time. As a mnemonic, the prediction direction may be thought of as being opposite the direction of the corresponding MV.
Similarly, MVbot, a forward motion vector of the bottom field of the future image, references either the top or bottom field of the past image. Forward and backward MVs are determined for predicting the top and/or bottom fields of the current image by scaling the forward MV of the corresponding field of the future image.
In particular, MVf,top, the forward motion vector for predicting the top field of the current image, is determined according to the expression MVf,top=(MVtop*TRB,top)/TRD,top+MVD, where MVD is a delta motion vector for a search area, TRB,top corresponds to a temporal spacing between the top field of the current image and the field of the past image which is referenced by MVtop, and TRD,top corresponds to a temporal spacing between the top field of the future image and the field of the past image which is referenced by MVtop. The temporal spacing may be related to a frame rate at which the images are displayed.
Similarly, MVf,bot, the forward motion vector for predicting the bottom field of the current image, is determined according to the expression MVf,bot=(MVbot*TRB,bot)/TRD,bot+MVD, where MVD is a delta motion vector, TRB,bot corresponds to a temporal spacing between the bottom field of the current image and the field of the past image which is referenced by MVbot, and TRD,bot corresponds to a temporal spacing between the bottom field of the future MB and the field of the past MB which is referenced by MVbot.
MVb,top, the backward motion vector for predicting the top field of the current MB is determined according to the equation MVb,top=((TRB,topxe2x88x92TRD,top)*MVtop)/TRD,top when the delta motion vector MVD=0, or MVb,top=MVf,topxe2x88x92MVtop when MVDxe2x89xa00.
MVb,bot, the backward motion vector for predicting the bottom field of the current MB is determined according to the equation MVb,bot=((TRB,botxe2x88x92TRD,bot)*MVbot)/TRD,bot when the delta motion vector MVD=0, or MVb,bot=MVf,botxe2x88x92MVbot when MVDxe2x89xa00.
A corresponding decoder is also presented.
In another aspect of the invention, a method is presented for selecting a coding mode for a current predicted, field coded MB having top and bottom fields, in a sequence of digital video MBs. The coding mode may be a backward mode, where the reference MB is temporally after the current MB in display order, a forward mote, where the reference MB is before the current MB, or average (e.g., bi-directional) mode, where an average of prior and subsequent reference MBs is used.
The method includes the step of determining a forward sum of absolute differences error, SADforward,field for the current MB relative to a past reference MB, which corresponds to a forward coding mode. SADforward,field indicates the error in pixel luminance values between the current MB and a best match MB in the past reference MB. A backward sum of absolute differences error, SADbackward,field for the current MB relative to a future reference MB, which corresponds to a backward coding mode is also determined. SADbackward,field indicates the error in pixel luminance values between the current MB and a best match MB in the future reference MB.
An average sum of absolute differences error, SADaverage,field for the current MB relative to an average of the past and future reference MBs, which corresponds to an average coding mode, is also determined. SADaverage,field indicates the error in pixel luminance values between the current MB and a MB which is the average of the best match MBs of the past and future reference MBs.
The coding mode is selected according to the minimum of the SADs. Bias terms which account for the number of required MVs of the respective coding modes may also be factored into the coding mode selection process.
SADforward,field, SADbackward,field, and SADaverage,field are determined by summing the component terms over the top and bottom fields.