The invention concerns a picture compression process in which each picture or picture macroblock is subjected to a coding chosen from among several types of coding.
It relates more particularly to the MPEG2 compression process. Although the invention is not limited to this type of compression, in the description hereafter, we shall refer mainly to this type.
Recalled below is the principle of such compression and, at this juncture, the coding types which must be selected for each macroblock will be indicated.
In the MPEG2 standard, it is possible to start from a picture containing in progressive mode 576 rows of 720 points each. In interlaced mode, this picture is composed of two frames each of which comprises 288 rows, also of 720 points each.
Each picture is split up into macroblocks, each of which is formed by a square of 16xc3x9716 luminance points. Each macroblock is thus formed of 4 square blocks of 8xc3x978 luminance points. With each of these 4 luminance blocks there are associated (in the 4.2.0 format) two chrominance blocks each of which has 8xc3x978 points, one of the blocks representing the colour difference or red chrominance signal Cr and the other block representing the colour difference or blue chrominance signal Cb. In the 4.2.2 format, with each luminance macroblock there are associated four 8xc3x978 chrominance blocks, 2 blocks for the blue chrominance and 2 blocks for the red chrominance. There is also a 4.4.4 format in which each of the luminance and chrominance components comprises 4 blocks of 8xc3x978.
Represented in FIG. 1 are four 8xc3x978 luminance blocks, with reference 10 within the set and 8xc3x978 chrominance blocks 12 and 14 for the blue and red chrominances respectively, the set illustrating a macroblock in the 4.2.0 norm.
Each block is coded by using a transformation denoted DCT which is a discrete cosine transformation which makes it possible to transform a luminance or chrominance block into a block of coefficients representing spatial frequencies. As may be seen in FIG. 2, a source block 16 is transformed into a block 18 of 8xc3x978 coefficients. The upper left corner 20 of the block 18 corresponds to the zero spatial frequencies (mean luminance value of the block) and, onwards of this origin 20, the horizontal frequencies increase towards the right, as represented by the arrow 22, while the vertical spatial frequencies increase from top to bottom, as represented by the arrow 24.
For each macroblock, it is necessary to choose the coding type: either xe2x80x9cintraxe2x80x9d or xe2x80x9cinterxe2x80x9d. Intra-coding consists in applying the DCT transformation to a source block of the picture, while inter-coding consists in applying the DCT transformation to a block representing the difference between a source block and a predicted block, or prediction block, of a preceding or following picture.
The choice depends in part on the type of pictures to which the macroblock belongs. These pictures are of three types: the first type is the so-called I or intra type, for which the coding is intra for all the macroblocks.
The second type is of P or prediction type; in pictures of this type, the coding of each macroblock can be either intra or inter. In the case of inter-coding of a macroblock of a picture of P type, the DCT transformation is applied to the difference between the current macroblock of this picture P and a prediction macroblock arising from the preceding I or P picture.
The third type of picture is called B or bidirectional. Each macroblock of such a picture type is either intra-coded or inter-coded. Inter-coding consists also in applying the transformation to the difference between the current macroblock of this B picture and a prediction macroblock. This prediction macroblock may arise either from the preceding picture or from the following picture or from both at once (bidirectional prediction), it being possible for the so-called preceding or following prediction pictures to be of I or P type only.
Represented in FIG. 3 is a set of pictures forming a group called a GOP (Group Of Pictures) which comprises twelve pictures, namely an I picture followed by eleven B and P pictures according to the following succession: B, B, P, B, B, P, B, B, P, B, B.
In the case of predicted pictures (that is to say those deduced from other pictures), motion estimation followed by motion compensation are applied to the macroblock to be coded. This is because, between two pictures, the macroblock may be situated at different locations by reason of the inter-picture and inter-frame motions. The effect of motion compensation is to compute the prediction macroblock according to a given mode of interpolation (commonly called the prediction mode); this macroblock will actually serve as prediction for the current macroblock in inter-picture mode for a given coding mode. Hereinafter, this prediction macroblock and, by the same token this coding mode, will be retained or rejected depending on the decisions taken within the procedure for computing the choice of the coding mode.
Moreover, in the case of interlaced scanning, for which each picture is formed of two successive frames, an odd frame and an even frame, it is necessary to determine whether the DCT transformation should be performed progressively or individually on each frame. This is because, depending on the motion of the picture or the structure of this picture, the result of the coding may be different depending on whether the transformation is performed on the picture or on each frame.
This choice is represented by FIGS. 4a and 4b. Represented in FIG. 4a is a macroblock 28 of an interlaced picture formed of rows 301, 303, . . . 3015 of an odd frame and of rows 302, 304, . . . , 3016 of an even frame. FIG. 4a corresponds to a DCT transformation performed on the picture; each of the four blocks of the macroblock 28 is transformed without rearranging the rows. Thus, the coding is performed on the four blocks 281, 282, 283, 284 forming the macroblock 28 and the transformation is performed on rows 301 to 308 for blocks 281 and 282 and on rows 309 to 3016 for blocks 283 and 284.
On the other hand, FIG. 4b represents a transformation performed separately for the odd and even frames. Blocks 321 and 322 correspond to the odd frame and blocks 323 and 324 to the even frame. Thus, block 321 comprises rows 301, 303, . . . , 3015, while blocks 323 and 324 comprise rows 302, 304, . . . , 3016.
Represented in FIG. 5 is a chart in block form representing the various operations to be performed in respect of the picture compression or video compression. Each digitized picture is applied to an input of a facility 40 which performs the separation into 8xc3x978 blocks and these 8xc3x978 blocks are transmitted to a facility 42 for selecting between the intra-coding and the inter-coding. If the coding chosen is intra, the block is transmitted to the DCT transformation facility 44. If the coding is inter, the block is subjected to a subtraction by a subtractor facility 46 which takes the differences between the block itself and a prediction block delivered by a time prediction facility 48.
After the DCT transformation 44, a quantization 50 is performed and the quantized coefficients thus obtained are coded according to a VLC coding of variable or fixed length 52. The coefficients thus coded obtained at the output of the coder 52 are directed to a buffer memory 54 whose output constitutes the coding output 56. To avoid saturation and drying up of the buffer memory 54, regulation 60 is performed which modifies the quantization 50.
To be able to perform the time prediction, the output of the quantization facility 50 is linked to the input of a facility 62 for inverse quantization Qxe2x88x921 whose output is applied to the input of a facility 64 performing the inverse cosine transformation DCTxe2x88x921. The output of the facility 64 is transmitted directly to a picture memory 66 when the coding of the block is intra, as determined by a facility 68. On the other hand, when the facility 68 decides that the coding is inter, the output of the block 64 is added, by virtue of an adder 70, to the prediction macroblock delivered by the facility 48 and it is the output from the adder 70 which is transmitted to the picture memory 66. The memory 66 keeps decoded pictures.
The inter-picture and inter-frame motions are estimated by a facility 72 which receives, on the one hand, information from the picture memory 66 and, on the other hand, from the output from the facility 40 for constructing blocks. Thus, it may be seen that the time prediction 48 is performed, on the one hand, on the basis of the picture memory 66 and, on the other hand, of the motion estimation 72.
The computation of the coding or binary train 52 depends, among other things, on the motion vectors arising from the motion estimation 72, the coded DCT coefficients, the headers of the macroblocks, and MPEG2 coding information delivered by a facility 74. This information relates to the MPEG2 signalling cues, namely the headers of the rows of macroblocks (or xe2x80x9cslicesxe2x80x9d), the headers of the pictures of the GOPs and the headers of the sequence to be coded.
For the pictures of I type, the coding must be chosen between frame-wise intra-coding or picture-wise intra-coding. This choice is made on the basis of an analysis of the activity contained in the macroblock; it gives good results in general.
For the pictures of P type and the pictures of B type, the number of decisions to be made is substantially larger.
Thus, a P macroblock may be coded according to eight basic modes:
intra; frame DCT,
intra; picture DCT,
without motion compensation (noMC); frame DCT,
without motion compensation (noMC); picture DCT,
with motion compensation; prediction by earlier frames; frame DCT,
with motion compensation; prediction by earlier frames; picture DCT,
with motion compensation; prediction by earlier picture; frame DCT,
with motion compensation; prediction by earlier picture; picture DCT.
For the pictures of B type, 14 basic coding modes are possible, namely two intra-coding modes, the four modes with motion compensation indicated with regard to the pictures of P type and, in addition, four similar coding modes based on the later prediction picture and 4 bidirectional modes, namely:
prediction by later frames; frame DCT,
prediction by later frames; picture DCT,
prediction by later picture; frame DCT,
prediction by later picture; picture DCT,
frame-wise bidirectional prediction; frame DCT,
frame-wise bidirectional prediction; picture DCT,
picture-wise bidirectional prediction; frame DCT,
picture-wise bidirectional prediction; picture DCT.
It has been noted that the criteria used hitherto to choose between these various modes for the P and B pictures gave results of variable quality.
In the earlier patent application Ser. No. 98 10802 filed on Aug. 28, 1998 in the name of THOMSON Multimedia, a compression process was described, in which a trial coding was performed according to all possible modes, or according to some of these modes, and for each trial coding, the coding cost and/or a quality factor is/are determined, the mode of coding used being selected as a function of the value of the coding cost and/or of the value of the quality factor.
Stated otherwise, according to the process described in the earlier application, the coding mode is not chosen as a function of an internal analysis of the macroblocks, but by trying all the coding possibilities (or some of them) and by adopting the mode which gives the best result, either because it minimizes the coding cost, or because it maximizes the quality of the picture, or because it provides the best compromise between coding cost and picture quality.
This process provides optimal results for each macroblock, especially when a given cost/quality criterion is chosen. However, it is difficult to apply in respect of a coding which has to be performed in real time, since it involves a considerable calculation time which is difficult to reduce, especially for portable apparatuses.
Nevertheless, this coding in accordance with the earlier application has been used to determine parameters which can be calculated in real time and which make it possible to select the modes of coding which provide the best results in the case of P and B pictures.
According to a first aspect of the invention, to determine the mode of coding the macroblocks of the P and/or B pictures, one calculates, for each coding mode, an energy parameter which, for the inter-coding modes, is the inter-picture energy and for the intra-coding modes is the relative energy with respect to the average value of luminance and one adopts the coding mode which provides the minimum energy parameter or a coding mode providing an energy parameter which does not exceed this minimum energy parameter by more than a predetermined factor k. In one example, the factor k is equal to 2.5.