1. Field of the Invention
The present invention relates to a method for the linear transformation of image signals on arbitrarily-shaped segments.
The term image signal is understood herein to mean a 2D or 3D digital signal. The term segment is understood to mean the geometry of the region of the image defining the object of interest. The invention relates more particularly to a method for the linear transformation of an image signal on arbitrarily-shaped and arbitrarily sized segments with a view to encoding.
The invention can be applied to image encoding by linear transformation.
The method presented falls within the context of the development of a new class of image encoders known as object-oriented encoders. This is a novel approach to encoding in which the audiovisual scene is represented as a set of objects in motion. This opens the way towards the implementation of new functions related to digital images.
Standardized systems for the encoding of images with digital bit rate reduction (for example according to the H261 recommendation of the CCITT for video encoding at P times 64 Kbits/s) are based on a sub-division of the digital image into a set of square blocks (with a general size of 8xc3x978) which undergo the encoding operations. This formulation is a rigid one and does not take account of the contents of each block, for example the existence of contours or sharp variations in luminance within a block.
The encoding of the image signal generally comprises a first phase of orthogonal linear transformation aimed at concentrating the energy of the signal and decorrelating its components
The linear transformation used is generally the discrete cosine transform or DCT which can be implemented by simple or efficient algorithms and therefore enables real-time applications. The DCT has been chosen because it can be used to obtain a decorrelation close to the maximum when the signal can be represented by a separable first-order Markov process that is highly correlated, i.e. with a correlation coefficient close to 1.
It is however highly advantageous for many applications to show the image in terms of objects to be found, described and transmitted.
In this context, an object can be defined as a arbitrarily-shaped and arbitrarily-sized region of the image, which may represent either a physical object or a predefined zone of interest or simply a region that has properties of homogeneity with respect to one or more criteria.
An object may be described by its shape and texture.
Several authors have recently taken an interest in the search for appropriate methods to encode, firstly, the shapes of objects and, secondly, the texture of objects.
Reference may be made to the drawing of FIG. 1 which illustrates the different steps implemented by these methods. The processing of shapes comprises an encoding operation, transmission, decoding at reception and depiction.
The processing of the texture comprises an orthogonal transformation, a quantification and an entropic encoding, transmission, entropic decoding with reverse quantification and reverse transformation to reconstitute the texture.
The methods of linear transformation on square blocks of a size fixed in advance cannot be directly applied to objects with arbitrarily-shaped segments for the encoding of the texture.
Thus, the present invention relates to a new method of linear transformation for the encoding of the texture on objects that have arbitrarily-shaped segments.
2. Description of the Prior Art
Recent studies on the subject have been published by several authors. The methods proposed can be divided into two classes: adaptive methods and methods of extrapolation.
Adaptive methods consist of the adaptation of the orthogonal linear transformations to the geometry of the segment.
Reference may be made to the adaptation of the Karhunen-Loeve transformation to segments by S. F. Chang and D. G. Messerschmidt, Transform Coding of Arbitrarily-Shaped Image Segments, Proceedings of ACM Multimedia, Anaheim, Calif., USA, pp. 83-90, Aug. 1993 and the method for the generation of orthogonal bases on segments proposed by Gilge, T. Engelhardt and R. Mehlan, Coding of Arbitrarily-Shaped Image Segments Based on a Generalized Orthogonal Transform. Signal processing,: Image Communication 1, pp. 153-180, 1989.
This method recommends the orthonormalization of any family of vectors, which are free on the segment, by an algebraic method known as the Gram-Schmidt method. This method is however very cumbersome from the computational point of view and is therefore unsuited toxe2x80x9creal-timexe2x80x9d applications. Gilge""s work has given rise to many studies on the fast generation of orthogonal bases on the segment ([M. Cermelli, F. Lavagetto and M. Pampolini, A Fast Algorithm for Region-Oriented Texture Coding, ICASSP, 1994, pp. 285-288], [W. Philips, A Fast Algorithm for the Generation of Orthogonal Base Functions on an Arbitrarily-Shaped Region, Proceedings of ICASSP 1992, Vol. 3, pp. 421-424, Mar. 1992, San Francisco], [W. Philips and C. Christopoulos, Fast Segmented Image Coding Using Weakly Separable Bases, Proceedings of ICASSP 1994, Vol. 5, pp. 345-348]).
The methods of extrapolation consisting in extending the signal to a regular segment which is generally the rectangle circumscribed in the segment to be encoded.
These methods enable the application of existing linear transformations to regular (rectangular or square-shaped) segments which are therefore fast and easy to implement. In this category of methods, the best known is the iterative method based on projections on convex sets proposed in H. H. Chen, M. R. Cinvalar and B. G. Haskell, A Block Transform Coder For Arbitrarily-Shaped Image Segments, International Conference on Image Processing (ICIP), 1994, pp. 85-89.
Other simpler methods have been tested, such as xe2x80x9czero-paddingxe2x80x9d (filling of the zone with zeros), xe2x80x9cmirroringxe2x80x9d (reflection of the signal on edges of the object) and morphological expansion ([S. F. Chang and D. G. Messerschmidt, Transform Coding Of Arbitrarily-Shaped Image Segments, Proceedings of ACM Multimedia, Anaheim, Calif., USA, pp. 83-90, Aug. 1993], [H. H. Chen, M. R. Chinvalar and B. G. Haskell, A Block Transform Coder For Arbitrarily-Shaped Image Segments, International Conference on Image Processing (ICIP), 1994, pp. 85-89]).
The two classes of methods recalled here above have their own advantages and drawbacks.
The adaptive methods have the advantage of perfect reconstruction with as many coefficients as there are points of the segment when no quantification is done. They enable the theory of encoding by linear transformation to be extended to arbitrarily-shaped segments. By contrast, they are generally cumbersome in terms of complexity and computation time.
The methods of extrapolation on the contrary offer an easy implementation suited to existing systems, but entail the risk of contributing artifacts related to the introduction of new frequencies in the signal.
For practical applications, it would therefore be worthwhile to combine the advantages of both categories of methods referred to here above, i.e. to use linear transformations that are fast and adapted to segments. The work done in D1 (M. Bi, W. K. Cham and Z. H. Zheng, Discrete Cosine Transform on Irregular Shape for Image Coding, IEEE Tencon 93 Proceedings, Beijing, pp. 402-405) and D2 (T. Sikora and B. Makai, Shape Adaptive DCT for Generic Coding of Video, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 1, pp. 59-72, Feb. 1995) proposes the application of a standard DCT orthogonal transformation separately on the rows and columns of the segment, by analogy with the row/column separability of the standard orthogonal transformations. This separability enables the successive application of two one-way transformations.
In D1, the authors propose a stage of analysis of the correlations between the coefficients derived from the first transformation, making the method fairly complex. In D2, the grouping and therefore the iteration of the transformation is done automatically: the method proposed by D2 (Shape Adapted DCTxe2x80x94SADCT) tends towards the combination of the advantages of speed of implementation. However, SADCT does not have flexibility and especially does not enable precise adaptation to the segment or properties of the signal on the segment.
An object of the invention is a method that combines the advantages of both classes of methods presented here above, hence a method with adaptativity to the segment, speed of computation and simplicity of implementation.
Thus, the implementation of the method proposed is of low complexity and its action is efficient. The method may be set up with the existing methods for it uses transformations that are known and already optimized.
The method can be adapted to the segment and makes it possible to take account of the 2D correlation of the signal on the segment. In terms of theoretical gain, the method proposed shows better results than all the other independent transformations of the signal, tested under certain usual hypotheses for the self-correlation function of the signal, which correspond to the intra mode. From the practical point of view, this method provides a gain as compared with the equally simple methods that have been tested. Its results are close to the results of far more complex methods.
The invention proposes a method for the linear transformation of the image signal on an arbitrarily-shaped segment by sub-division into regular sub-segments followed by the application of an orthogonal linear transformation to each segment and finally the iteration of the transformation in the transformed space.
It is assumed that the phase for the extraction of the objects has been completed, and the method is applied after this phase.
The invention therefore relates more specifically to a method for the linear transformation of the image signal on an arbitrarily-shaped segment, wherein chiefly said method comprises the following steps:
the sub-division of the segment into sub-segments of regular shapes (rectangular, square or linear shapes),
the application of an orthogonal linear transformation to each sub-segment,
the combining of the coefficients, coming from the first transformation, into classes of coefficients according to a predetermined criterion,
the iteration of the transformation on the classes of coefficients.
Indeed, should there remain a high correlation between the coefficients after the initial transformation step, the linear orthogonal transformation is iterated on sets of carefully chosen coefficients.
The use of a standard orthogonal linear transformation on each of the sub-segments (it is possible for example to use the DCT) is advantageous. This sub-division makes it possible to take advantage of the decorrelation and concentration capacity of the energy of a transformation such as the DCT in the context of the standard hypotheses of encoding on rectangles. In these hypotheses, the signal is modelled by a first-order separable Markov process highly correlated in the vertical and horizontal direction. This modelling is all the more valid when the zones to be encoded come from a segmentation on the criterion of homogeneity in terms of gray levels.
According to another characteristic, the combining step includes an intermediate step that consists in passing from a 2D space to vectors of coefficients with one dimension.
According to one mode of implementation, the intermediate step is performed by carrying out a zigzag reading of the coefficients.
According to another characteristic, the combining step consists in combining the coefficients representing the continuous components corresponding to each sub-segment in a vector having a size equal to the number of sub-segments.
According to another mode of implementation, the combining step consists in combining the same-ranking coefficients defined by the zigzag reading.
According to another mode of implementation, the combining step consists in combining the coefficients that are close in distance, a distance in the frequency space having been predefined.
According to another characteristic, the method furthermore consists in carrying out a final reorganization of coefficients according to a chosen order.
According to another characteristic, the chosen order is that of their rank after transformation, which is advantageous in the case of variable length encoding by analogy with the zigzag reading of the coefficients in the H261 recommendation of the CCITT.
According to another characteristic, the orthogonal linear transformation applied to the sub-segments is a discrete cosine transform.
Preferably, the same linear transformation is made during the iteration as during the processing of the sub-segments.
According to one mode of implementation, the iteration of the transformation is done with a standardized transformation matrix.
Thus, according to the invention, the problem of the transformation encoding of the arbitrarily-shaped segments is posed in a novel fashion. Although the formulation is based on known tools, it is distinguished from the other hitherto known methods, which have been presented in the present application, by a novel approach that is expressed by a sequencing of the different steps contributing to the resolving of the problem with the advantages indicated.
Furthermore, the combining step is original as compared with any method of block processing of the variables that has been proposed hitherto. This step enables the use of the correlation remaining on the segment and therefore makes it possible to achieve a more efficient decorrelation and a better concentration of energy for each segment. This type of combination of coefficients coming from a first 2D DCT step on rectangular segments has not been hitherto used. This operation indeed is not obvious inasmuch as the initial rectangles have variable sizes and is not a natural operation in principle. Besides, a standardization may prove to be necessary. Moreover, it is shown that the results are improved as compared with independent transformations on variable-sized blocks.