Digital images and digital image sequences are known to occupy a great deal of memory space that, when they are transmitted, makes it necessary to compress them in order to avoid the problem of congestion in the communications network used for that transmission, the usable bit rate of the network generally being limited.
Current compressive coding techniques, notably that of the H.264/MPEG-4 AVC (Advanced Video Coding) standard developed by the Joint Video Team (JVT) working group and stemming from the collaboration of the Video Coding Expert Group (VCEG) of the International Telecommunications Union and the Moving Picture Expert Group (MPEG) of the ISO/IEC, described in the document ISO/IEC 14496-10, use techniques of spatial or temporal prediction concerning groups of blocks of pixels referred to as macroblocks of a current image relative to other macroblocks of the same image or a preceding or subsequent image. After such predictive coding, the pixel blocks are processed by applying a discrete cosine transform and then quantized. The coefficients of the quantized pixel blocks are then scanned in a reading order that makes it possible to take advantage of the large number of zero coefficients at high frequencies, and then they are coded entropically.
Those compressive coding techniques are effective, but they are not the optimum for compressing images featuring regions of homogeneous texture. In the H.264/MPEG-4 AVC standard, spatial prediction of a macroblock in an image relative to another macroblock in the same image is possible only if that other macroblock adjoins the macroblock to be predicted in certain predetermined directions relative thereto, generally above and to the left, in a so-called causal vicinity. Similarly, the prediction of the movement vectors of a block or macroblock of an image is a causal prediction relative to the movement vectors of adjoining blocks.
That type of prediction therefore does not make it possible to take advantage of the textural similarity of macroblocks of separate areas with the same texture or of macroblocks that are far apart in an area with the same texture. In other words, that type of technique does not make it possible to address simultaneously as a single entity a group of macroblocks having common characteristics. Moreover, the movement of areas of homogeneous texture from one image to another is not taken advantage of optimally either: the temporal prediction of the H.264/MPEG-4 AVC standard makes it possible to take advantage of the movement of a macroblock from one image to another, but not the fact that the macroblock is part of an area having homogeneous movement.
To solve that problem, so-called regional coding techniques have been proposed that segment the images of a video sequence in such a manner as to isolate areas of homogeneous texture or movement in the images before coding them. Those areas define objects in the images for which a choice is made to use more refined or less refined coding, for example. An example of such a technique is described in the IEEE (Institute of Electrical and Electronics Engineers) paper published in 2004 entitled “An encoder-decoder texture replacement method with application to content-based movie coding” by A. Dumitras et al.
However, those regional coding techniques require sending the decoder that is the destination of the video sequence a segmentation map calculated for each image in the coder that sends the video sequence. The segmentation map is very costly in terms of memory space because its boundaries generally do not correspond to the boundaries of the blocks of pixels of the segmented images. Moreover, the segmentation of a video sequence into regions of arbitrary shape is not deterministic: the boundaries of the segmentation map generally do not correspond to the boundaries of the real objects that the map attempts to subdivide in the images of the video sequence. Because of this, only the representation and transmission of such segmentation maps have been standardized (in the MPEG-4 part 2 standard), not their production.
In conclusion, there are many segmentation techniques and none of them is sufficiently generic for effective segmentation of all kinds of image sequence. Those complex and non-deterministic techniques have therefore never been deployed in industrial coders.