1. Field of the Invention
The present invention relates to a video coding system in which image data is organized into video objects and coded according to scalable coding scheme. The coding scheme provides spatial scalability, temporal scalability or both.
2. Related Art
Video coding is a field that currently exhibits dynamic change. Video coding generally relates to any method that represents natural and/or synthetic visual information in an efficient manner. A variety of video coding standards currently are established and a number of other coding standards are being drafted. The present invention relates to an invention originally proposed for use in the Motion Pictures Experts Group standard MPEG4.
One earlier video standard, known as “MPEG-2,” codes video information as video pictures or “frames.” Consider a sequence of video information to be coded, the sequence represented by a series of frames. The MPEG-2 standard coded each frame according to one of three coding methods. A given image could be coded according to:                Intra-coding where the frame was coded without reference to any other frame (known as “I-pictures”),        Predictive-coding where the frame was coded with reference to one previously coded frame (known as “P-pictures”), or        Bi-directionally predictive coding where the frame was coded with reference to as many as two previously coded frames (known as “B-pictures”).        
Frames are not necessarily coded in the order in which they appear under MPEG-2. It is possible to code a first frame as an I-picture then code a fourth frame as a P-picture predicted from the I-picture. Second and third frames may be coded as B-pictures, each predicted with reference to the I- and P-pictures previously coded. A time index is provided to permit a decoder to reassemble the correct frame sequence when it decodes coded data.
MPEG-4, currently being drafted, integrated the concept of “video objects” to I-, P- and B-coding. Video object based coders decompose a video sequence into video objects. An example is provided in FIGS. 1(a)-(d). There, a frame includes image data including the head and shoulders of a narrator, a suspended logo and a background. An encoder may determine that the narrator, logo and background are three distinct video objects, each shown separately in FIGS. 1(b)-(d). The video coder may code each separately.
Video object-based coding schemes recognize that video objects may remain in a video sequence across many frames. The appearance of a video object on any given frame is a “video object plane” or “VOP”. VOPs may be coded as I-VOPs using intra coding techniques, as P-VOPs using predictive coding techniques or B-VOPs using bi-directionally predictive coding techniques. For each VOP, additional administrative data is transmitted with the coded VOP data that provides information regarding, for example, the video objects location in the displayed image.
Coding video information on a video object-basis may improve coding efficiency in certain applications. For example, if the logo were a static image, an encoder may code it as an initial I-VOP. However, for subsequent frames, coding the logo as a P- or B-VOP would yield almost no image data. The P- or B-coding essentially amounts to an “instruction” that the original image information should be redisplayed for successive frames. Such coding provides improved coding efficiency.
One goal of the MPEG-4 standard is to provide a coding scheme that may be used with decoders of various processing power. Simple decoders should be able to decode coded video data for display. More powerful decoders should be able to decode the coded video data and obtain superior output such as improved image quality or attached functionalities. As of the priority date of this application, no known video object-based coding scheme provides such flexibility.
MPEG-2 provides scalability for its video picture-based coder. However, the scalability protocol defined by MPEG-2 is tremendously complicated. Coding of spatial scalability, where additional data for VOPs is coded into an optional enhancement layer, is coded using a first protocol. Coding of temporal scalability, where data of additional VOPs is coded in the enhancement layer, is coded using a second protocol. Each protocol is separately defined from the other and requires highly context specific analysis and complicated lookup tables in a decoder. The scalability protocol of the MPEG-2 is disadvantageous because its complexity makes it difficult to implement. Accordingly, there is a further need in the art for a generalized scalability protocol.