The present invention concerns in general terms the analysis of a digital signal and proposes for this purpose a device and method for analysing a digital signal by decomposition into a plurality of resolution levels, and segmentation.
The purpose of the analysis is to provide a hierarchical segmentation of the signal, thus make it possible to access the objects or regions present in an image, at several resolution levels, with several possible levels of detail. Access to the objects of an image can be used for different purposes:
selective coding of the objects of the image, granting a higher coding quality to the xe2x80x9cimportantxe2x80x9d objects in the image,
progressive transmission of the date of the image, with transmission of the more important objects before the others,
extraction of a particular objects from the image, with a view to its manipulation, transmission, coding, and storage.
The present invention is more particularly applicable to the analysis of a digital signal. Hereinafter, the concern will more particularly be with the analysis of digital images or video sequences. A video sequence is defined as a succession of digital images.
There exist several known ways of effecting the decomposition of a signal on several resolution levels; it is for example possible to use Gaussian/Laplacian pyramids, or to decompose the signal into frequency sub-bands at several resolution levels.
The remainder of this description will be concerned with the second case, but it is important to note that the present invention applies to all known multi-resolution decompositions.
In the particular case of a decomposition into frequency sub-bands, the decomposition consists of creating, from the digital signal, a set of sub-bands each containing a limited frequency spectrum. The sub-bands can be of difference resolutions, the resolution of a sub-band being the number of samples per unit length used for representing this sub-band. In the case of an image digital signal, a frequency sub-band of this signal can be considered to be an image, that is to say a bi-dimensional array of digital values.
The decomposition of a signal into frequency sub-bands makes it possible to decorrelate the signals so as to eliminate the redundancy existing in the digital image prior to the compression proper. The sub-bands can then be compressed more effectively than the original signal. Moreover, the low sub-band of such a decomposition is a faithful reproduction, at a lower resolution, of the original image. It is therefore particularly well suited to segmentation.
The segmentation of a digital image will make it possible to effect a partitioning of the image into homogeneous regions which do not overlap in this context, the image is considered to consist of objects with two dimensions. The segmentation is a low-level process whose purpose is to effect a partitioning of the image into a certain number of sub-elements called regions. The partitioning is such that the regions are disconnected and their joining constitutes the image. The regions correspond or do not correspond to objects in the image, the term objects referring to information of a semantic nature. Very often, however, an object corresponds to a region or set of regions Each region can be represented by information representing its shape, colour or texture. The homogeneity of the region of course depends on a particular criteria of ho homogeneity; proximity of the average values or preservation of the contrast or colour, for ample.
Object means an entity of the image corresponding to a semantic unit, for example the face of a person. An object can consist of one or more regions contained in the image. Hereinafter the term object or region will be used indifferently.
Conventionally, the segmentation of the digital image is effected on a single resolution level, which is the resolution of the image itself. Conventionally, the segmentation methods include a first step known as marking, that is to say the interior of the regions housing a local homogeneity is extracted from the image Next a decision stop precisely defines the contours of the areas containing homogeneous data. At the end of this step, each pixel of the image is associated with a label identifying the region to which it belongs. The set of all of the labels of all the pixels is called a segmentation map,
This type of segmentation makes it possible to obtain a relatively effective segmentation of the image but has the drawbacks of being slow and not very robust and presenting all the objects at the same resolution.
This is the case for example with the so called MPEG4 standard (from the English xe2x80x9cMotion Picture Expert Groupxe2x80x9d), for which an ISO/IEC standard is currently being produced, in the MPEG4 coder, and more particularly in the case of the coding of fixed images, the decomposition of the image into frequency sub-bands is used conjointly with a segmentation of the image. A step prior to coding (not standardised) is responsible for isolating the objects of the image (video objects) and representing each of the these objects by a mask. In the case of a binary mask, the spatial support of the mask has the same size as the original image and a point on the mask at the value 1 (or respectively 0) indicates that the pixel at the same position in the image belongs to the object (or respectively is outside the object).
For each object, the mask is then transmitted to a shape decoder whilst the texture for each object is decomposed into sub-bands, and the sub-bands are then transmitted to a texture decoder.
This method has a certain number of drawbacks. This is because the object is accessible only at its highest resolution level there is no progressivity in segmentation. Moreover, the number of objects manipulated is a priori the same at all levels, whilst it may be more advantageous to have a number of objects increasing with the (spatial) resolution, that is to say a true conjoint scalability between the resolution and the number of objects.
The article xe2x80x9cMultiresolution adaptative image segmentation based on global and local statisticsxe2x80x9d by Boukerroui, Basset and Baekurt, which appeared in IEEE international Conference on Image Processing, 24-28 Oct. 1999, vol. 1 pages 358 to 361, describes a hierarchical segmentation based on a multiresolution pyramid of an image, effected by discrete wavelet transform, known as DWT.
In addition, the article xe2x80x9cMultiresolution image segmentation for region-based motion estimation and compensationxe2x80x9d by Salgado, Garcia, Menendez and Rendon, which appeared in IEEE International Conference on image Processing, 24-28 Oct. 1999, vol. 2, pages 135 to 139, describes a hierarchical segmentation also based on a multiresolution pyramid of an image. A partitioning of the image effected at the lowest resolution level is projected onto the higher resolution levels.
However, none of these known methods provides access to the region or objects with different resolution levels, in a consistent and coherent manner. Coherent means here that an object at a given resolution level always descends from a single object with a lower resolution (parent), and gives rise to at least one object at the higher resolution level (child or children).
The present invention aims to remedy the drawbacks of the prior article by providing a method and device for the hierarchical segmentation of a digital signal which offers access to the regions or objects at different resolution levels, in a consistent and coherent manner.
In this context the invention concerns a method of analysing a set of data representing physical quantities, including the steps of,
decomposition of the set of data on a plurality of resolution levels,
segmentation of at least a sub-part of a given resolution level, into at least two homogeneous regions, said given resolution level not being the highest resolution level in the decomposition.
characterised in that it includes the steps of:
storage of information representing at least part of the result of the segmentation of the previous step,
segmentation of at least one sub-part of the higher resolution level into at least one homogeneous region, according to the information stored.
More particularly, the invention proposes a method of analysing a set of data representing physical quantities, including the steps of:
decomposition of the set of data on a plurality of resolution levels,
first segmentation of at least one sub-part of a given resolution level, into at least two homogeneous regions, said given resolution level not being the highest resolution level in the decomposition,
characterised in that it includes the steps of;
extraction of contour data from the result of the segmentation of the previous step,
second segmentation of at least one sub-part of the resolution level higher than the given level into at least one homogeneous region, as a function of the contour data extracted.
Correlatively, the invention proposes a device for analysing a set of data representing physical quantities, having:
means of decomposing the set of data on a plurality of resolution levels,
means for the fir segmentation of at least one sub-part of a given resolution level, into at least two homogeneous regions, said given resolution level not being the highest resolution level in the decomposition,
characterised in that it has:
means of extracting contour data from the result of the segmentation of the previous step,
means for the second segmentation of at least one sub-part of the resolution level higher than the given resolution level into at least one homogeneous region, as a function of the extracted contour data.
By virtue of the invention, the segmentation is coherent, or continuous: an object with a given resolution level always descends from a single object with a lower resolution level (parent), and gives rise to at least one object at the higher resolution level (a child or children).
In addition, there is a hierarchical segmentation (at several resolution levels). The advantages of hierarchical segmentation are many:
progressive object-based coding and transmission can be effective at several resolution levels,
it is possible to segment the image according to finer and finer details or objects on progressing through the resolution levels; the user can thus access the objects of the image with a more or less great level of detail. For example, at a lower resolution, the user often needs only a coarse segmentation (the chest and face of a person). whilst he would wish to access a higher level of detail when the resolution increases (eyes, nose, mouth on the face, etc).
the robustness of the global segmentation and the speed of segmentation are generally higher in the case of a hierarchical segmentation than in the case of a segmentation at a single level.
According to a preferred characteristic, the decomposition is at each resolution level a decomposition into a plurality of frequency sub-bands. This type of decomposition is normally used in image processing, and is simple and rapid to implement.
According to another preferred characteristic, the first segmentation is effected on a low-frequency sub-band of the given resolution level, This is because the lower-frequency sub-band forms a xe2x80x9csimplifiedxe2x80x9d version of the signal, and it is consequently advantageous to effect the segmentation on this sub-band.
According to preferred characteristics, the given resolution level is the lowest resolution level, and the extraction and second segmentation steps are effective iteratively as far as the highest resolution level. Thus a hierarchical segmentation is obtained on all the resolution levels.
According to a preferred characteristic, the second segmentation includes:
a projection of a contour image resulting from the firs segmentation, on said at least one sub-part of tie higher resolution level which is to be segmented,
a marking of the coefficients of said at least one sub-part of the higher resolution level, as a function of the result of the projection, and
a decision.
According to another preferred characteristic, the second segmentation is effected on a lower-frequency sub-band of the higher resolution level. As with the first segmentation, it is advantageous to effect the second segmentation on a xe2x80x9csimplifiedxe2x80x9d version of the signal.
According to a preferred characteristic, the contour data extraction includes, for each coefficient of the segmented sub-part:
a comparison of said coefficient with its neighbours,
setting of a contour coefficient corresponding to said coefficient to a first predetermined value if the coefficient is different from at least one of its neighbours, or to a second predetermined value if the coefficient is similar to all its neighbours.
The device has means adapted to implement the above characteristics.
The invention also concerns a digital apparatus including the analysis device, or means of implementing the analysis method. The advantages of the device and digital apparatus are identical to those disclosed above.
The invention also concerns an information storage means, which can be read by a computer or microprocessors integrated or not integrated into the device, possibly removable, storing a program implementing the analysis method.