1. Field of the Invention
The present invention relates to a video coding system in which image data is organized into video objects and coded according to a scalable coding scheme. The coding scheme provides spatial scalability, temporal scalability or both.
2. Related Art
Video coding is a field that currently exhibits dynamic change. Video coding generally relates to any method that represents natural and/or synthetic visual information in an efficient manner. A variety of video coding standards currently are established and a number of other coding standards are being drafted. The present invention relates to an invention originally proposed for use in the Motion Pictures Experts Group standard MPEG-4.
One earlier video standard, known as xe2x80x9cMPEG-2,xe2x80x9d codes video information as video pictures or xe2x80x9cframes.xe2x80x9d Consider a sequence of video information to be coded, the sequence represented by a series of times. The MPEG-2 standard coded each frame according to one of three coding methods. A given image could be coded according to:
Intra-coding where the frame was coded without reference to any other frame (known as xe2x80x9cI-picturesxe2x80x9d),
Predictive-coding where the frame was coded with reference to one previously coded frame (known as xe2x80x9cP-picturesxe2x80x9d), or
Bi-directionally predictive coding where the frame was coded with reference to as many as two previously coded frames (known as xe2x80x9cB-picturesxe2x80x9d).
Frames are not necessarily coded in the order in which they appear under MPEG-2. It is possible to code a first frame as an I-picture then code a fourth frame as a P-picture predicted from the I-picture. Second and third frames may be coded as B-pictures, each predicted with reference to the I- and P-pictures previously coded. A time index is provided to permit a decoder to reassemble the correct frame sequence when it decodes coded data.
MPEG4, currently being drafted, integrated the concept of xe2x80x9cvideo objectsxe2x80x9d to I-, P- and B-coding. Video object based coders decompose a video sequence into video objects. An example is provided in FIGS. 1(a)-(d). There, a frame includes image data including the head and shoulders of a narrator, a suspended logo and a background. An encoder may determine that the narrator, logo and background are three distinct video objects, each shown separately in FIGS. 1(b)-(d). The video coder may code each separately.
Video object-based coding schemes recognize that video objects may remain in a video sequence across many frames. The appearance of a video object on any given frame is a xe2x80x9cvideo object planexe2x80x9d or xe2x80x9cVOPxe2x80x9d. VOPs may be coded as I-VOPs using intra coding techniques, as P-VOPs using predictive coding techniques or B-VOPs using bi-directionally predictive coding techniques. For each VOP, additional administrative data is transmitted with the coded VOP data that provides information regarding, for example, the video objects location in the displayed image.
Coding video information on a video object-basis may improve coding efficiency in certain applications. For example, if the logo were a static image, an encoder may code it as an initial I-VOP. However, for subsequent frames, coding the iogo as a P- or B-VOP would yield almost no image data. The P- or B-coding essentially amounts to an xe2x80x9cinstructionxe2x80x9d that the original image information should be redisplayed for successive frames. Such coding provides improved coding efficiency.
One goal of the MPEG-4 standard is to provide a coding scheme that may be used with decoders of various processing power. Simple decoders should be able to decode coded video data for display. More powerful decoders should be able to decode the coded video data and obtain superior output such as improved image quality or attached functionalities. As of the priority date of this application, no known video object-based coding scheme provides such flexibility.
MPEG-2 provides scalability for its video picture-based coder. However, the scalability protocol defined by MPEG-2 is tremendously complicated. Coding of spatial scalability, where additional data for VOPs is coded into an optional enhancement layer, is coded using a first protocol. Coding of temporal scalability, where data of additional VOPs is coded in the enhancement layer, is coded using a second protocol. Each protocol is separately defined from the other and requires highly context specific analysis and complicated lookup tables in a decoder. The scalability protocol of the MPEG-2 is disadvantageous because its complexity makes it difficult to implement. Accordingly, there is a further need in the art for a generalized scalability protocol.
The present invention provides a video coding system that codes video objects as video object layers. Data of each video object may be segregated into one or more layers. A base layer contains sufficient information to decode a basic representation of the video object. Enhancement layers contain supplementary data regarding the video object that, if decoded, enhance the basic representation obtained from the base layer. The present invention thus provides a coding scheme suitable for use with decoders of varying processing power. A simple decoder may decode only the base layer to obtain the basic representation. However, more powerful decoders may decode the base layer data and additional enhancement layer data to obtain improved decoded output.