Audiovisual information, such as a video of a person speaking, can be converted into a digital signal and transmitted over a communications network. The digital signal can then be converted back into audiovisual information for display. At the time of this writing, the Moving Picture Experts Group (MPEG) of the International Standardization Organization (ISO) is developing a new standard, known as MPEG-4, for the encoding of audiovisual information that will be sent over a communications network at a low transmission rate, or “bitrate.” When complete, MPEG-4 is expected to enable interactive mobile multimedia communications, video phone conferences and a host of other applications.
These applications will be achieved by coding visual objects, which include natural or synthetic video objects, into a generalized coded bitstream representing video information, referred to as a “visual” bitstream. A bitstream that contains both visual and audio information is also referred to as a “systems” bitstream.
A video object is a specific type of natural visual object, and is further composed of layers called Video Object Layers (VOLs). Each VOL is composed of Video Object Planes (VOPs), which can be thought of as snapshots in time of a VOL. The advent of video objects and VOPs in video coding permits significant coding savings by selectively apportioning bits among parts of the frame that require a relatively large number of bits and other parts that require a relatively small number of bits. VOPs also permit additional functionality, such as object manipulation.
As an example, FIG. 1 illustrates a frame 100 for coding that includes the head and shoulders of a narrator 110, a logo 120 suspended within the frame 100 and a background 130. The logo 120 may be static, having no motion and no animation. In such a case, bit savings may be realized by coding the logo 120 only once. For display, the coded logo 120 could be decoded and displayed continuously from the single coded representation. Similarly, it may be desirable to allocate fewer bits for coding a semi-static or slowly moving background 130. Bit savings realized by coding the logo 120 and background 130 at lower rates may permit coding of the narrator 110 at a higher rate, where the perceptual significance of the image may reside. VOPs are suited to such applications. FIG. 1 also illustrates the frame 100 broken into three VOPs. By convention, a background 130 is generally assigned VOPO. The narrator 110 and logo 120 may be assigned VOP1 and VOP2, respectively. Of course, other number schemes can also be used to label these regions.
Note that not all elements within a VOP will merit identical treatment. For example, certain areas within a VOP may require animation, whereas others may be relatively static.
Consider the example of VOP1 in FIG. 1. The perceptually significant areas of VOP1 center around the facial features of the figure. The clothes and hair of the narrator 110 may not require animation to the same extent that the facial features do. Accordingly, as disclosed in U.S. patent application Ser. No. 08/986,118 entitled “Video Objects Coded by Keyregions,” keyregions may be used to emphasize certain areas of a VOP over others.
The object based organization of MPEG-4 video, in principle, will provide a number of benefits in error robustness, quality tradeoffs and scene composition. The current MPEG-4 standards, however, lack a number of tools, and their associated syntax and semantics, to fully and flexibly exploit this object based organization. In particular, there is no way to identify an element, such as a visual object, VOL or keyregion, as more important than other elements of the same type.
For example, a higher degree of error robustness would be achieved if a higher priority could be assigned to the foreground speaker object as compared to a less relevant background object. If an encoder or decoder can only process a limited number or objects, it would be helpful to have the encoder or decoder know which objects should be processed first.
Moreover, because the MPEG-4 system will offer scene description and composition flexibility, reconstructed scenes would remain meaningful even when low priority objects are only partially available, or even totally unavailable. Low priority objects could become unavailable, for example, due to data loss or corruption.
Finally, in the event of channel congestion, identifying important video data would be very useful because such data could be scheduled for delivery ahead of less important video data. The remaining video data could be scheduled later, or even discarded. Prioritization would also be useful for graceful degradation when bandwidth, memory or computational resources become limited.
In view of the foregoing, it can be appreciated that a substantial need exists for a method and apparatus to prioritize video objects when they are coded, and solving the other problems discussed above.