The present invention provides a method and apparatus for coding of digital video images such as bi-directionally predicted video object planes (B-VOPs), in particular, where the B-VOP and/or a reference image used to code the B-VOP is interlaced coded.
The invention is particularly suitable for use with various multimedia applications, and is compatible with the MPEG-4 Verification Model (VM) 8.0 standard (MPEG-4 VM 8.0) described in document ISO/IEC/JTC1/SC29/WG11 N1796, entitled "MPEG-4 Video Verification Model Version 8.01", Stockholm, July 1997, incorporated herein by reference. The MPEG-2 standard is a precursor to the MPEG-4 standard, and is described in document ISO/IEC 13818-2, entitled "Information Technology--Generic Coding of Moving Pictures and Associated Audio, Recommendation H.262," Mar. 25, 1994, incorporated herein by reference.
MPEG-4 is a coding standard which provides a flexible framework and an open set of coding tools for communication, access, and manipulation of digital audio-visual data. These tools support a wide range of features. The flexible framework of MPEG-4 supports various combinations of coding tools and their corresponding functionalities for applications required by the computer, telecommunication, and entertainment (i.e., TV and film) industries, such as database browsing, information retrieval, and interactive communications.
MPEG-4 provides standardized core technologies allowing efficient storage, transmission and manipulation of video data in multimedia environments. MPEG-4 achieves efficient compression, object scalability, spatial and temporal scalability, and error resilience.
The MPEG-4 video VM coder/decoder (codec) is a block- and object-based hybrid coder with motion compensation. Texture is encoded with an 8.times.8 Discrete Cosine Transformation (DCT) utilizing overlapped block-motion compensation. Object shapes are represented as alpha maps and encoded using a Content-based Arithmetic Encoding (CAE) algorithm or a modified DCT coder, both using temporal prediction. The coder can handle sprites as they are known from computer graphics. Other coding methods, such as wavelet and sprite coding, may also be used for special applications.
Motion compensated texture coding is a well known approach for video coding, and can be modeled as a three-stage process. The first stage is signal processing which includes motion estimation and compensation (ME/MC) and a two-dimensional (2-D) spatial transformation. The objective of ME/MC and the spatial transformation is to take advantage of temporal and spatial correlations in a video sequence to optimize the rate-distortion performance of quantization and entropy coding under a complexity constraint. The most common technique for ME/MC has been block matching, and the most common spatial transformation has been the DCT.
However, special concerns arise for ME/MC of macroblocks (MBs) in B-VOPs when the MB is itself interlaced coded and/or uses reference images which are interlaced coded.
In particular, it would be desirable to have an efficient technique for providing motion vector (MV) predictors for a MB in a B-VOP. It would also be desirable to have an efficient technique for direct mode coding of a field coded MB in a B-VOP. It would further be desirable to have a coding mode decision process for a MB in a field coded B-VOP for selecting the reference image which is results in the most efficient coding.
The present invention provides a system having the above and other advantages.