1. Field of the Invention
The present invention generally relates to video compression, and more particularly, to a video coding method applied to a video sequence and provided for use in a video encoder comprising base layer coding means, receiving said video sequence and generating therefrom base layer signals that correspond to video objects (Vos) contained in the video frames of said sequence and constitute a first bitstream suitable for transmission at a base layer bit rate to video decoder, and enhancement layer coding means, receiving said video sequence and a decoded version of said base layer signals and generating therefrom enhancement layer signals associated with corresponding base layer video signals and suitable for transmission at an enhancement layer bit rate to said video decoder.
2. Description of the Related Art
More precisely, the present invention relates to a method allowing to code the VOs of said sequence and comprising the steps of:
(A) segmenting the video sequence into said VOs;
(B) coding the successive video object planes (VOPs) of each of said VOs, said coding step itself comprising sub-steps of coding the texture and the shape of said VOPs, said texture coding sub-step itself comprising a first coding operation without prediction for the VOPs called intracoded or I-VOPs, coded without any temporal reference to another VOP, a second coding operation with a unidirectional prediction for the VOPs called predictive or P-VOPs, coded using only a past I- or P-VOP as a temporal reference, and a third coding operation with a bidirectional prediction for the VOPs called bidirectional predictive or P-VOPs, coded using both past and future I- or P-VOPs as temporal references.
The invention also relates to computer executable process steps stored on a computer readable medium and provided for carrying out such a coding method, to a corresponding computer program product, and to a video encoder carrying out said method.
The temporal scalability is a feature now offered by several video coding schemes. It is, for example, one of the numerous options of the MPEG-4 video standard. A base layer is encoded at a given frame rate, and an additional layer, called enhancement layer, is also encoded, in order to provide the missing frames to form a video signal with a higher frame rate and thus to provide a higher temporal resolution at the display side. At the decoding side, only the base layer is usually decoded, but the decoder may also, in addition, decode the enhancement layer, which allows to output more frames per second.
Several structures are used in MPEG-4, for example the video objects (VOs), which are the entities that a user is allowed to access and manipulate, and the video object planes (VOPs), which are instances of a video object at a given time. In an encoded bitstream, different types of VOPs can be found: intra coded VOPs, using only spatial redundancy, predictive coded VOPs, using motion estimation and compensation from a past reference VOP, and bidirectionally predictive coded VOPs, using motion estimation and compensation from past and future reference VOPs. As the MPEG-4 video standard is a predictive coding scheme, some temporal references have to be defined for each coded non-intra VOP. In the single layer case or in the base layer of a scalable stream, temporal references are defined by the standard in a unique way, as illustrated in FIG. 1 where, the base and enhancement layers being designated by (BL) and (EL) respectively, the reference for a P-VOP and a B-VOP are shown (each arrow corresponds to a possible temporal reference). On the contrary, for the temporal enhancement layer of an MPEG-4 stream, three VOPs can be taken as a possible temporal reference for the motion prediction: the most recently decoded VOP of the enhancement layer, or the previous VOP in display order of the base layer, or the next VOP in display order of the base layer, as also illustrated in FIG. 1 where these three possible choices are shown for a P-VOP and a B-VOP of the temporal enhancement layer: one reference has to be selected for each P-VOP of the enhancement layer and two for each B-VOP of the same layer.