A video-object-based encoding framework, such as the MPEG-4 encoding standard, referred to as MPEG-4 Visual Version 1, ISO/IEC 14496-2, allows video objects having various shapes to be encoded instead of the whole rectangular picture. Rectangular pictures are represented by pixels having luminance and chrominance values. In addition to these values, a pixel of a video object has a binary shape value. This value is obtained from a rectangular picture by a segmentation process and is represented by one bit indicating if the pixel is in the object or not. The separate encoding of the video objects may enrich the user interaction in several multimedia services due to flexible access to the digital video data signal and an easy manipulation of the video information. In this framework, the encoder may perform a locally defined pre-processing aimed at the automatic identification of the objects appearing in a sequence of pictures.
The operation of segmentation aims at partitioning a rectangular picture or a video sequence of pictures into regions extracted according to a given criterion. FIG. 1 shows an example of a segmentation process in which a rectangular picture (RP) has been partitioned into several video objects (VO1 to VO4). In the case of a video sequence, this partition should achieve the temporal coherence of the resulting sequence of object masks representing the video object. Different methods have been proposed for segmentation of video sequences, based on either a spatial homogeneity, a motion coherence criterion or a spatiotemporal processing. These methods are expected to identify classes of moving objects according to the luminance homogeneity and the motion coherence criterion.