The present invention relates to a method of preprocessing input data corresponding to picture elements (pixels) that represent arbitrarily shaped objects, said input data comprising for each object a texture part, corresponding to the values of the pixels of said object, and an object mask, subdividing said input data into a first and a second subset of data respectively corresponding to fully or partially opaque pixels and to transparent pixels in said texture part, said preprocessing method being provided for determining DCT (discrete cosine transform) coefficients corresponding to said opaque pixels and comprising for each considered object the steps of:
(1) partitioning the object plane into bidimensional blocks;
(2) introducing in the picture area defined by said block a set of basis vectors chosen in order to express an estimate of the original pixel values as a linear combination of said basis vectors;
(3) defining a cost function "psgr" to measure the distortion between the original expression of the pixel values and said estimate of this original expression;
(4) finding the coefficients allowing to minimize the cost function "psgr". This invention, which aims at efficiently encoding arbitrarily shaped textures, is useful especially in relation with the MPEG-4 standard, but without being restricted to such an application.
The MPEG-4 standard, issued in 1999, was intended to propose a unified way for efficiently encoding visual objects in natural and synthetic pictures. For an encoder having to deal with these objects (generally made of several layers which in turn may contain arbitrarily shaped objects), they come in the form of two components: the object mask, which can be either binary or made of gray level pixels and represent the alpha channel values used by the decoder for the scene composition, and the texture part, i.e. the values of the pixels of the object (a white pixel in the mask means that the corresponding pixel in the texture part is opaque, thus replacing the pixels of any other object behind it in the layer hierarchy, while a black pixel means that the corresponding pixel in the texture part is fully transparent, i.e. not visible). The invention specifically addresses the encoding operation of the texture part.
For encoding moving textures in an MPEG-4 encoder, the conventional method is to use the DCT transform (discrete cosine transform) on image blocks. More precisely, the plane to be encoded is partitioned into macroblocks of size 16xc3x9716 pixels, and the 16xc3x9716 luminance information is further partitioned into four 8xc3x978 blocks encoded by the bidimensional 8xc3x978 DCT transform (the same 2D transform is used again for the two 8xc3x978 blocks containing the U and V chrominance information). For arbitrarily shaped objects, any 8xc3x978 block can fall into three categories: either it contains transparent pixels only (there is then no need to encode the texture information) or opaque pixels only (the standard rectangular 8xc3x978 DCT is used to encode the texture information) or it contains at least an opaque pixel and a transparent one. The problem to be solved, in this third situation, is the efficient encoding of this partial texture information in terms of bit consumption.
First the textures can be classically DCT-encoded as rectangular macroblocks after the empty spaces have been filled in by extending the texture boundary pixels (each sample at the boundary of an opaque region is replicated horizontally to the left or right direction in order to replace the transparent areas, and the same process is repeated in the vertical direction, the obtained padding pixels being later removed by the decoder since it knows the object mask). This padding method however introduces patterns that may be not optimal from the point of view of the frequency spectrum (they may be flat in the horizontal direction and randomly varying in the vertical one and result in unwanted frequency components that consume more bits when the macroblocks are DCT-encoded).
Another solution, normalized within the MPEG-4 standard, is the so-called shape-adaptive DCT, that proceeds in two steps to encode the patterns of FIG. 1 (given as an illustration). As illustrated in FIG. 2, all opaque pixels are first shifted to the most upper position in the block to be encoded, and an adaptive one-dimensional n-DCT is then applied to each column, n being the number of opaque pixels in said column (in the example of FIG. 2, from left to right, the 1, 4, 7, 5, 7 and 1-DCT are respectively applied in the vertical direction). The resulting vertical DCT coefficients are then similarly shifted to the most left position in the block, which yields the pattern of FIG. 3, and the one-dimensional n-DCT is similarly applied to each row (n being the number of opaque pixels in the concerned row). Unfortunately, with this method, which needs special functionalities in the associated MPEG-4 decoder (as opposed to the classical 8xc3x978 DCT algorithm used for fully opaque blocks), the shift operations generally introduce high frequencies as they concatenate pixels or coefficients that are spatially separated and have therefore little correlation.
It is therefore an object of the invention to propose a preprocessing method avoiding to introduce such undesirable frequencies and leading to a better coding efficiency.
To this end, the invention relates to a method such as defined in the introductory part of the description and which is moreover characterized in that:
(a) said cost function "psgr" is given by a relation of the type:   ψ  =      (                  f        opaque            ,                        ∑                      i            =            1                    64                ⁢                  xe2x80x83                ⁢                              c            i                    ⁢                      b                          opaque              ⁡                              (                i                )                                                          )  
where f is the column-vector of the pixels of the concerned block, ((bi), i ≮ (1 to 64)) are the basis vectors of a 8xc3x978 DCT, fopaque is the restriction of f to the opaque pixels of said block, ((bopaque), i xcex5 (1 to 64)) are the restriction of said basis vectors to the location of the opaque pixels of the block, and       ∑          i      =      1        64    ⁢      xe2x80x83    ⁢            c      i        ⁢          b              opaque        ⁡                  (          i          )                    
is called the reconstruction of fopaque;
(b) said finding step itself comprises the following operations:
initialization of the following parameters, including:
iteration parameter k=0;
initial estimation of f opaqueE=0;
initial reconstruction coefficients ci 0=0;
extraction of the basis vectors restricted to the opaque pixels and calculation of the projection coefficients:
Pi 0={(fopaquexe2x88x92fopaqueE), bopaque(i)}
with { } denoting the cross-correlation function, i varying from 1 to 64, and (bopaque) being said restricted basis vectors;
iteration(s), each of said iteration being provided for performing the following sub-steps:
[a] finding the index i* of the basis vector which best contributes to minimize the cost function;
[b] updating the reconstruction of fopaqueE according to the relation:
xe2x80x83fopaqueE(k+1)=fopaqueE(k)+pikxc2x7bopaque(i)
[c] and updating the reconstruction coefficients Cik+1=cik for ixe2x89xa0i* and ci*k+1=ci*k+pi*k and the projection coefficients pi*k+1;
interruption of said iterations if said cost function "psgr" is below a given threshold or if a predetermined number of iterations is reached.