The present invention relates to a picture encoding device and method thereof, a picture decoding device and method thereof, and a recording medium, and more particularly relates to a picture encoding device and method thereof, a picture decoding device and method thereof, and a recording medium suitable for use in transmission systems such as television conferencing systems, television telephone systems, broadcast equipment and multimedia database searching systems etc. where a moving picture signal is recorded on a recording medium such as a magneto-optical disc or magnetic tape, played back, and displayed on a displaying device such as a display, or where the moving picture signal is transmitted via a transmission path from a transmission side to a receiving side, received on the receiving side and displayed, or where, for example, the moving picture is edited and recorded.
In systems such as television conferencing systems or television telephone systems where moving picture (picture signals) signals are transmitted over remote distances, the picture signal is compression encoded utilizing the image signal line correlation and/or the inter-frame correlation in order to utilize the transmission path in an efficient manner.
When line correlation is utilized in this kind of case, the image signal is compressed using transform processing such as, for example, DCT (Discrete Cosine Transform) processing or wavelet transform processing.
Further, when inter-frame correlation is utilized, it is also possible for the image signal to be further compressed and encoded. For example, as shown in FIG. 1, when frame pictures PC1, PC2 and PC3 are generated at times t1, t2 and t3, respectively, the difference between the image signals for frame pictures PC1 and PC2 are calculated, PC12 is generated, the difference between frame pictures PC2 and PC3 is calculated and PC23 is generated. Generally, frame pictures that are neighboring with respect to time do not possess dramatic variations so when differences are calculated, these difference signals have only small values. This difference signal is then compressed and an amount of code is compressed.
However, the original picture cannot be decoded because only the difference signal is transmitted. The picture signals for pictures of each frame are therefore compression encoded as one of three types of picture, i.e. as I pictures, P pictures or B pictures.
As shown in FIG. 2A and 2B, a 17 frame picture signal from frame F1 to F17 is processed as a single unit as a group picture (GOP). The picture signal for the leading frame F1 is then encoded as an I picture, the second frame F2 as a B picture, and the third frame as a P picture etc., and these are then respectively processed. Processing is then carried out for B pictures and P pictures alternately from the fourth frame F4 to the seventeenth frame F17.
The picture signal for one frame portion of the I picture is transmitted as the I picture picture signal without modification. With regards to this, basically, as shown in FIG. 2A, the difference between the I picture and the P picture picture signal leading this P picture with respect to time is transmitted as the P picture picture signal. Further, basically, the difference between the average value of the image signals for both of the frames leading and following with respect to time are encoded as the B picture picture signal, as shown in FIG. 2B and then transmitted.
FIG. 3 shows the theory for the method for encoding the moving picture signal in this way. As shown in FIG. 3, this picture signal is transmitted (encoded within the picture) along the transmission path as transmission data FIX because the first frame F1 is processed as an I picture. With regards to this, the difference of the average values of the frame F1 preceding with respect to time and the frame F3 following with respect to time is calculated because the second frame F2 is processed as a B picture, with this difference then being transmitted as transmission data F2X.
Four types of process exist for this B picture. In a first process, the picture signal data for the original frame F2 is transmitted without modification as the transmission data F2X (SP1) (intra encoding), with processing being the same as that for the case of I pictures. In a second process, the difference with the picture signal for the frame F3 for one frame after is calculated and this difference (SP2) is transmitted (backward estimation encoding). In a third process, the difference (SP3) with the picture signal for the frame F1 preceding with respect to time is transmitted (forward estimation encoding). Further, in a fourth process, the difference (SP4) of the average values for frame F1 preceding with respect to time and frame F3 following with respect to time is generated and transmitted as the transmission data F2X (bi-directional estimation encoding).
The process of these four processes for which the transmission data is the least is then adopted.
When the difference data is transmitted, a motion vector x1 (a motion vector between frames F1 and F2) (in the case of forward estimation) between the images for the frames that are the targets of the difference calculation, a motion vector x2 (motion vector between frames F3 and F2) (in the case of reverse estimation) or both x1 and x2 (in the case of bidirectional estimation) are transmitted together with the difference data.
Regarding a third frame F3 processed as a P picture, the frame F1 leading with respect to time is taken as the estimation picture, the difference signal (SP3) with this frame and the motion vector x3 are calculated and this is transmitted as the transmission data F3X (forward estimation encoding). Alternatively, data for the frame F3 can be transmitted as the transmission data F3 (SP1) (intra encoding). The process of these processes for which the transmission data becomes small is then selected in the same way as for the B picture.
FIG. 4 shows an example structure of a device for encoding and then transmitting a moving picture signal, then receiving and encoding this signal based on the aforementioned theory. The encoding device 1 encodes an inputted picture signal and transmits this signal to a transmission path or to a recording medium 3. The decoding device 2 plays back a signal from the transmission path or the recording medium 3 and this signal is then decoded and outputted.
At the encoding device 1, the inputted picture signal is inputted to a processing circuit 11, divided into luminance and chrominance signals (in the case of this example, a color difference signal), with these signals then being analog to digital (A/D) converted by an A/D converter 12 and an A/D converter 13. The picture signals (picture data) A/D converted into digital signals by the A/D converters 12 and 13 are provided to and stored in a frame memory 14 after being filtered at a pre-filter 19. The frame memory 14 stores the luminance signal in a luminance signal frame memory 15 and the color difference signal in a color difference frame memory 16. An encoder 17 reads moving picture signals (picture data) stored in the frame memory 14 and carries out encoding. The details of this operation are described in detail later with reference to FIG. 5.
Signals encoded by the encoder 17 are transmitted as a bitstream via the transmission path 3 or recorded on the recording medium 3.
Data received from the transmission path or the recording medium 3 is then provided to a decoder 31 of the decoding device 2 and decoded. The details of the decoder 31 are described later with reference to FIG. 9.
Data decoded by the decoder 31 is provided to the frame memory 33. At this time, the luminance signal is provided to and stored in a luminance signal frame memory 34 of the frame memory 33 and the color difference signal is provided to and stored in the color difference signal frame memory 35. The luminance signal and the chrominance signal read from the luminance signal frame memory 34 and the color difference signal frame memory 35 are D/A converted D/A converters 36 and 37 after being filtered at a post filter 39, are provided to the processing circuit 38 and are then combined. These signals are then outputted to and displayed on, for example, a display such as a cathode ray tube (CRT), although this is not shown in the drawings.
Next, a description is given of the encoder operation for an example of a process for an MPEG (Moving Picture Experts Group) 2 method. This method was put forward as a standard proposal discussed in ISO-IEC/JTC1/SC29/WG11 and is a hybrid method standard for combining motion compensation estimation encoding and DCT (Direct Cosine Transform) encoding. The details of this moving picture signal encoding method are disclosed in IS13818-2.
An example structure of an encoder 17 for encoding moving images conforming to the above MPEG2 method is shown in FIG. 5.
Picture data to be encoded is inputted to a motion vector detector 50. The motion vector detector 50 processes each frame of picture data as an I picture, P picture or B picture in accordance with a preset prescribed sequence. Processing of each of the sequentially inputted frame pictures as an I, P or B picture is preset (for example, the group of pictures comprising the frames F1 to F17 is processed as I, B, P, B, P, . . . , as shown in FIG. 2A and FIG. 2B).
Picture data for frames (for example, F1) processed as I pictures is transferred from the motion vector detector 50 to a forward source picture part 51a of a frame memory 51 and stored. Picture data for frames (for example, frame F2) processed as B pictures are transferred to a source picture part 51b and stored. Picture data for frames (for example, F2) processed as B pictures is transferred to a backward source picture part 51c and stored.
Further, when frame pictures to be processed as B pictures (frame F4) or P pictures (frame F5) are inputted on the following timing, picture data for the first P picture (frame F3) stored up until this point in the backward source picture part 51c is transferred to the forward source picture part 51a, picture data for the next B picture (frame F4) is stored in (written to) the source picture part 51b, and picture data for the next P picture (frame F5) is stored in (written to) the backward source picture part 51c. The kind of process is then sequentially repeated.
Each of the picture signals stored in the frame memory 51 are read by the motion vector detector 50 in macro blocks of a fixed size. A description of macro blocks of the related art will now be given. As shown in FIG. 6, H dot lines of picture signals stored in the frame memory of the encoding device are made to be of a data format that can be gathered in V lines every one line. This one frame signal is divided into N slices in 16 line units, as shown in the drawings, with each slice then being divided into M macro blocks. Each microblock comprises a luminance signal corresponding to 16.times.16 pixels (dots), with this luminance signal then being further divided into blocks [1] to [4] in 8.times.8 dot units. This 16.times.16 dot luminance signal is made to correspond to an 8.times.8 dot Cb signal and an 8.times.8 dot Cr signal, with the above macro blocks being inputted to the motion vector detector 50.
Returning to FIG. 5, frame estimation mode processing and field estimation mode processing is carried out on the macro blocks outputted from the frame memory 51 at an estimation mode switcher 52. Further, calculations for estimations within the picture, forward estimations, backward estimations and bi-directional estimations are carried out at a computation part 53 under the control of an estimation discriminator 54. Which of these processes is used is decided in accordance with an estimation error signal (the difference between the reference picture taken as a processing target and the estimation picture for this). An absolute value sum (or squared sum) that can be used in this determination is generated by the motion vector detector 50.
Here, a description is given of the frame estimation mode and the field estimation mode occurring at the estimation mode switcher 52.
In the case where the frame estimation mode is set, the estimation mode switcher 52 outputs four luminance blocks Y[1] to Y[4] provided by the motion vector detector 50 to the following stage computation part 53 without modification. Namely, in this case, as shown in FIG. 7A, odd fields (the first field) line data and even field (second field) line data is in a mixed state at each luminance block. With this frame estimation mode, estimation is carried out in four luminance block units (macro blocks) so that one motion vector corresponds to four luminance blocks.
With regards to this, in field estimation mode the estimation mode switcher 52 converts to the structure shown in FIG. 7B and then outputs signals inputted from the motion vector detector 50 using the structure shown in FIG. 7A. Namely, of the four luminance blocks, the luminance blocks Y[1] and Y[2] are made to comprise only of dots for lines of odd numbered fields and the other two luminance blocks Y[3] and Y[4] are made to comprise only of even field line data before being outputted to the computation part 53. In this case, one motion vector corresponds to the two luminance blocks Y[1] and Y[2] and the remaining one vector corresponds to the two the other two luminance blocks Y[3] and Y[4].
The motion vector detector 50 outputs the absolute value sum of the estimation error occurring in frame estimation mode and the absolute value of the estimation error occurring in field estimation mode to the estimation mode switcher 52. The estimation mode switcher 52 compares the absolute value sum of the estimation errors in frame estimation mode and field estimation mode, performs processing corresponding to the estimation mode having the smaller value, and outputs data to the computation part 53. Further, the estimation mode switcher 52 outputs a flag (estimation flag) showing the mode corresponding to the executed process to a variable length encoder 58 and a motion compensator 64.
However, the process corresponding to the estimation mode is, in reality, carried out at the motion vector detector 50, i.e. the motion vector detector 50 outputs a signal of a structure corresponding to the decided mode to the estimation mode switcher 52 and the estimation mode switcher 52 then outputs this signal to the following stage computation part 53 without modification.
In the case of frame estimation mode, as shown in FIG. 7A, the color difference signal is provided to the computation part 53 with the line data for odd fields and the line data for even fields in a mixed state. Further, in the case of field estimation mode, as shown in FIG. 7B, the upper half (four lines) of each of the color difference blocks Cb and Cr are taken as the color difference signal for odd fields corresponding to the luminance blocks Y[1] and Y[2]. The lower half (four lines) is then taken as the color difference signal for even fields corresponding to the luminance blocks Y[3] and Y[4].
Moreover, the motion vector detector 50 generates an absolute value sum for the estimation error in order to decide which estimation of the estimation within the picture, the forward estimation, the backward estimation and the bi-directional estimation is carried out at the estimation discriminator 54 in the following way.
The difference between the absolute value .vertline..SIGMA.Aij.vertline. of the sum of the signal Aij for the macro block of the reference picture (where (i, j) are the coordinates of the pixels comprising the macro blocks) and the sum .SIGMA..vertline.Aij.vertline. of the absolute value of the signal Aij for the macro block is obtained as the absolute value sum of the estimation error for estimation within the picture. Further, the sum .SIGMA..vertline.Aij-Bij.vertline. of the absolute value .vertline.Aij-Bij.vertline. for the difference Aij-Bij of the macro block signal Aij for the reference picture and the estimation picture macro block signal Bij is obtained as the absolute value sum of the estimation error for forward estimation. The absolute value of the estimation error for the backward estimation and bi-directional estimation estimation errors is also obtained in the same way as for forward estimation (the estimation picture is converted to an estimation picture differing from the case of forward estimation).
This absolute value sum (the absolute value sum of the remaining difference ME) is provided to the estimation discriminator 54. The estimation discriminator 54 selects the smallest of the absolute value sums for the estimation errors in the forward estimation, backward estimation and bi-directional estimations as the absolute value sum of the estimation error for the inter-estimation. Moreover, the absolute value sum of the estimation error for this inter estimation and the absolute value sum for the estimation error of the estimation within the image are compared and the smaller value selected. The mode corresponding to this selected absolute value is then selected as the estimation mode. Namely, if the absolute value sum of the estimation error for the estimation within the picture is smaller, the estimation mode within the picture is set. If the absolute value sum of the estimation error for the inter estimation is smaller, the mode of the forward estimation, backward estimation and bidirectional estimation modes for which the corresponding absolute value sum is the smallest is set-up.
In this way, the motion vector detector 50, configured in frame estimation mode of field estimation mode so as to correspond to the mode s elected by the estimation mode switcher 52, provides the reference picture macro block signal to the computation part 53 via the estimation mode switcher 52. The motion vector detector 50 also detects the motion vector between the estimation picture corresponding to the estimation mode selected from the four estimation modes by the estimation discriminator 54 and the reference picture and outputs this motion vector to the variable length encoder 58 and the motion compensator 64. As described above, the motion vector corresponding to the smallest estimation error absolute value sum is selected as this estimation error.
When the motion vector detector 50 is reading picture data for an I picture from the forward source picture part 51a, the estimation discriminator 54 selects the estimation mode within a frame (picture) (the mode where motion compensation is not carried out) as the estimation mode and switches a switch 53d of the computation part 53 over to a connection point "a". In this way I picture picture data is inputted to a DCT mode switcher 55.
This DCT mode switcher 55 then outputs four luminance blocks of data to a DCT circuit 56 in one of either a state where odd field (first field) lines and even field (second field) lines are mixed or in a state (frame DCT mode) where odd and ever fields are separated (field DCT mode), as shown in FIG. 8A and FIG. 8B.
Namely, the DCT mode switcher 55 compares the encoding efficiency when data for odd and even fields is mixed and DCT processing is carried out and the encoding efficiency for when divided and selects the mode with the better encoding efficiency.
For example, as shown in FIG. 8A, the input signal is configured with odd fields and even fields lines mixed. The difference between up and down neighboring odd field line signals and even field line signals is calculated and an absolute value sum (or a squared sum) is obtained. When the inputted signal is configured with odd field and even field lines divided, as shown in FIG. 8B, the differences between fellow up and down neighboring odd field line signals and the differences between fellow even field line signals is calculated and the respective absolute value sums (or squared sums) are obtained. The respective absolute value sums are then compared and the DCT mode corresponding to the smallest value is set-up, i.e. if the former is smaller, frame DCT mode is set-up, and if the latter is smaller, field DCT mode is set-up.
Data of the configuration corresponding to the selected DCT mode is outputted to the DCT circuit 56 and a flag indicating the selected DCT mode is outputted to the variable length encoder 58 and a DCT block line substituter 65.
As becomes clear from comparing the estimation mode (FIG. 7A, B) at the estimation mode switcher 52 and the DCT mode (FIG. 8A, B) at this DCT mode switcher 55, the data structure occurring in each of the modes is practically the same with regards to the luminance block.
At the estimation mode switcher 52, when frame estimation mode (the mode where odd lines and even lines are mixed) is selected the possibility of selection of frame DCT mode (the mode where odd lines and even lines are mixed) is high even at the DCT mode switcher 55. Further, at the estimation mode switcher 52, when the field estimation mode (the mode where data for odd fields and even fields is divided) is selected, the possibility of the field DCT mode being selected (the mode where data for odd fields and even fields is separated) at the DCT mode switcher 55 is high.
This is, however, not always the case. The mode is decided at the estimation mode switcher 52 so that the absolute value sum of the estimation error becomes small, with the mode being decided at the DCT mode switcher 55 so that the encoding efficiency is good.
I picture picture data outputted from the DCT mode switcher 55 is inputted to the DCT circuit 56, DCT (discreet cosine transform) processed and transformed by a DCT coefficient. This DCT coefficient is inputted to a quantizer 57, quantized in quantization steps corresponding to the amount of data accumulated (buffer accumulation amount) at a transmission buffer 59, and inputted to the variable length encoder 58.
The variable length encoder 58 converts picture data (in this case, I picture data) into variable length code such as Hoffman code in such a manner as to correspond to quantization steps (scales) provided from the quantizer 57.
Quantization steps from the quantizer 57 (scales), estimation modes from the estimation discriminator 54 (a mode showing setting of one of estimation within picture, forward estimation, backward estimation, or bi-directional estimation), motion vectors from the motion vector detector 50, estimation flags from the estimation mode switcher 52 (flags showing the setting of one of frame estimation mode or field estimation mode), and DCT flags outputted by the DOT mode switcher 55 (flags showing the setting of one of either frame DcoT mode or field DT mode) are inputted to the variable length encoder 58 and then variable length encoded.
A transmission buffer 59 stores the inputted data once and outputs data corresponding to the stored amount to the quantizer 57.
When the volume of remaining data increases up to the allowable upper limit, the transmission buffer 59 makes the quantization schedule for the quantizer 57 larger using a quantization control signal (buffer feedback) so as to lower the amount of quantization data. On the contrary, when the amount of remaining data is reduced as far as the permissible lower limit, the transmission buffer 59 makes the quantization scale of the quantizer 57 small using a quantization control signal so as to increase the amount of quantized data. This prevents overflowing or underflowing of the transmission buffer 59.
Data stored at the transmission buffer 59 is read-out at a prescribed timing, outputted to the transmission path and, for example, recorded on the recording medium 3 shown in FIG. 4.
On the other hand, I picture data outputted from the quantizer 57 is inputted to an de-quantizer 60 and dequantized so as to correspond to the quantization steps provided by the quantizer 57. The output of the de-quantizer 60 is inputted to the an IDCT (inverse DCT) 61 and inverse DCT processed. The lines of data are then returned to the original form at a DCT block line replacer 65 so as to correspond to DCT flags from the DCT mode switcher 55. The data is then provided to a forward estimation picture part 63a of the frame memory 63 via a computation part 62.
When picture data for each of the frames inputted sequentially is, for example, processed as I, B, P, B, P, B . . . pictures, the motion vector detector 50 processes picture data for the frame inputted first as an I picture. Then, before processing the picture for the frame inputted next as a B picture, the picture data for a further subsequently inputted frame is processed as a P picture.
If a P picture is not prepared beforehand as the forward estimation picture, the B picture cannot be decoded because of the forward estimation accompanying B pictures. The motion vector detector 50 therefore stars processing picture data for the P picture stored in the backward source picture part 51c after processing the I picture. Then, as in the above case, the absolute value sum of the difference (estimation error) between frames is provided from the motion vector detector 50 to the estimation mode switcher 52 and the estimation discriminator 54 in macroblock units. The estimation mode switcher 52 and the estimation discriminator 54 then set the frame/field estimation mode, estimation within picture, forward estimation, backward estimation or bi-directional estimation mode so as to correspond to the absolute value sum of the estimation error for these P picture macro blocks.
When the estimation mode within a frame is set, the computation part 53 switches the switch 53d over to the side of connection point a in the way describe above. This data is then outputted via the DCT mode switcher 55, the DCT circuit 56, the quantizer 57, the variable length encoder 58 and the transmission buffer 59 in the same way as for the I picture data. This data is then provided to a backward estimation picture part 63b of the frame memory 63 via the de-quantizer 60, the IDCT 61, the DCT block line replacer 65 and the computation part 62 and stored.
During forward estimation mode, the switch 53d is switched over to connection point b, the picture (in the current case the I picture picture) stored in the forward estimation picture part 63a of the frame memory 63 is read and motion compensated by the motion compensator 64 in accordance with the motion vector outputted from the motion vector detector 50. Namely, when a forward estimation mode instruction is given by the estimation discriminator 54, the motion compensator 64 shifts the read address of the forward estimation picture part 63a from the position corresponding to the position of the macroblock currently being outputted by the motion vector detector 50 by an amount corresponding to the motion vector, reads out the data, and generates estimation picture data.
Estimation picture data outputted by the motion compensator 64 is provided to a calculator 53a. The calculator 53a hen subtracts estimation picture data corresponding to macroblocks provided by the motion compensator 64 from data for these reference picture macroblocks provided by the estimation mode switcher 52 and outputs data for the difference (estimation error). This difference data is outputted via the DCT mode switcher 55, the DCT circuit 56, the quantizer 57, the variable length encoder 58 and the transmission buffer 59. This difference data is then locally decoded by the de-quantizer 60, IDCT 61 and the DCT block line replacer 65 and inputted to the calculator 62.
Data the same as the estimation picture data provided to the calculator 53a from the motion compensator 64 is also provided to the calculator 62. In this way, the calculator 62 adds the estimation picture data outputted by the motion compensator 64 to the difference data outputted by the DCT block line replacer 65 and the original (decoded) P picture picture data is obtained. This P picture picture data is then provided to the backward estimation picture part 63b of the frame memory 63 and stored.
The motion vector detector 50 executes the processing for the next B picture after storing data for the I picture and the P picture in the forward estimation picture part 63a and the backward estimation picture part 63b. The estimation mode switcher 52 and the estimation discriminator 54 set the frame/field mode to correspond to the magnitude of the absolute value sum of the difference between the frames in macroblock units and set the estimation mode to be one of either the estimation within a frame mode, forward estimation mode, backward estimation mode or bi-directional estimation mode.
As described above, during the estimation within a frame mode or the forward estimation mode, the switch 53d is switched over to connection point a or the connection point b accordingly. At this time, processing is carried out in the same way as the case for P pictures and the data is outputted.
With regards to this, when the backward estimation mode or the bidirectional estimation mode is set, the switch 53d can be switched over to connection point c or d, respectively.
When the switch 53d is switched over to connection point c in backward mode, data for the picture (in the current case, a P picture picture) stored in the backward estimation picture part 63b is read-out and motion compensation corresponding to the motion vector to be outputted by the motion vector detector 50 is carried out by the motion compensator 64. Namely, when a backward estimation mode setting instruction is given by the estimation discriminator 54, the motion compensator 64 shifts the read address for the backward estimation picture part 63b from the position corresponding to the macroblock position currently being outputted by the motion vector detector 50 by an amount corresponding to the motion vector, reads the data, and generates estimation picture data.
Estimation picture data outputted by the motion compensator 64 is provided to a calculator 53b. The calculator 53b then subtracts estimation picture data supplied by the motion compensator 64 from the reference picture macroblock data provided from the estimation mode switcher 52 and outputs the difference. This difference data is then outputted via the DCT mode switcher 55, the DCT circuit 56, the quantizer 57, the variable length encoder 58 and the transmission buffer 59.
When the switch 53d is switched over to connection point d in bi-directional estimation mode, both the picture (in this case, I picture picture) data stored in the forward estimation picture part 63a and the data for the picture (in this case, a P picture picture) stored in the backward estimation picture part 63b is read-out a motion compensation corresponding to the motion vector outputted by the motion vector detector 50 is carried out by the motion compensator 64. Namely, when a bi-directional estimation mode setting instruction is given by the estimation discriminator 54, the motion compensator 64 shifts the read addresses at the forward estimation picture part 63a and the backward estimation picture part 63b from positions corresponding to the macroblock positions currently being outputted by the motion vector detector 50 by an amount corresponding to the motion vectors (in this case there are two motion vectors, one for forward estimation picture use and one for backward estimation picture use), reads the data, and generates estimation picture data.
Estimation data outputted from the motion compensator 64 is supplied to a calculator 53c. The calculator 53c then subtracts the average value of the estimation picture data supplied by the motion compensator 64 from the macroblock data for the reference image provided by the motion vector detector 50 and outputs the difference. This difference data is then outputted via the DCT mode switcher 55, the DCT circuit 56, the quantizer 57, the variable length encoder 58 and the transmission buffer 59.
The picture for the B picture is not stored in the frame memory 63 so that this B picture is not taken as the picture for other estimation pictures.
At the frame memory 63, the forward estimation picture part 63a and the backward estimation picture part 63b can be bank switched as necessary and that stored in one or the other can be switched over and outputted as the forward estimation picture or the backward estimation picture with respect to a prescribed reference picture.
In the above, a description is given centered about the luminance block, but processing can also be carried out taking the macro blocks shown in FIG. 7A and FIG. 7B, and FIG. 8A and FIG. 8B as units in the same way for color difference blocks. The motion vectors used for the case of processing color difference blocks are half of the corresponding luminance block motion vectors in the vertical direction and the horizontal direction.
Next, a description is given of the operation of the decoder 31 of FIG. 4. FIG. 9 is a block diagram showing an example of the configuration of the decoder 31 of FIG. 4. Encoded picture data transmitted via the transmission path 3 of FIG. 4 or recorded on the recording medium 3 is received by a receiving circuit not shown in the drawings or played-back by a playback device, stored once at a signal receiving buffer 81, and then supplied to a variable length decoder 82 of a decoder 90. The variable length decoder 82 variable length-decodes data supplied from the signal receiving buffer 81, outputs the motion vector, estimation mode and estimation flag to a motion compensator 87, and outputs the quantization step and the decoded picture data to an de-quantizer 83. The variable length decoder 82 also supplies a DCT flag to a DCT block line replacer 88.
The de-quantizer 83 de-quantizes picture data supplied from the variable length decoder 82 in accordance with the quantization steps supplied by the same variable length decoder 82 and outputs the results to and IDCT 84. Data (DCT coefficients) outputted from the de-quantizer 83 are inverse DCT processed at the IDCT circuit 84 and, at the DCT block line replacer 88, supplied to a calculator 85 after line replacement is carried out based on the DCT flag in the same way as in the case for the DCT block line replacer 65 of FIG. 5.
Picture data supplied from the DCT block line replacer 88 is, in the case of I picture data, outputted from the calculator 85 and then supplied to and stored in a forward estimation picture part 86a of the frame memory 86 for generating estimation picture data for the picture data (P or B picture data) to be inputted to the calculator 85 afterwards. This data is then outputted to the frame memory 33 of FIG. 4.
In the case of data for the forward estimation mode where the data is P picture data where picture data for one frame previous is taken as the estimation picture data, the picture data supplied by the DCT block line replacer 88 is stored in a forward estimation picture part 86a of the frame memory 86, the picture data (I picture data) for one frame previous is read out and motion compensation corresponding to the motion vector outputted from the variable length decoder 82 is carried out at the motion compensator 87. Then, at the calculator 85, this data is added with the picture data (difference data) supplied by the DCT block line replacer 88 and outputted. This added data, i.e. the decoded P picture data is then supplied to and stored in the backward estimation picture part 86b of the frame memory 86 in order to generate estimation picture data for the picture data (B picture or P picture data) inputted to the calculator 85 afterwards.
Even with P picture data, with data for the estimation mode within a picture, processing is not carried out at the calculator 85, in the same way as for I picture data, and the P picture data is stored in the backward estimation picture part 86b without modification.
This P picture is not outputted to the frame memory 33 of FIG. 4 at this time because this is a picture to be displayed after the next B picture (as described above, P pictures inputted after B pictures are to be processed before the B pictures and then transmitted).
When the picture data supplied from the DCT block line replacer 88 is B picture data, I picture picture data (in the case of forward estimation mode) stored in the forward estimation picture part 86a of the frame memory 86, P picture picture data (in the case of backward estimation mode) stored in the backward estimation picture part 86b, or both (in the case of bidirectional estimation mode) is/are read-out in accordance with the estimation mode supplied from the variable length decoder 82. Motion compensation corresponding to the motion vector outputted by the variable length decoder 82 is then performed at the motion compensator 87 and an estimation picture is generated. An estimation picture is not, however, generated when motion compensation is not necessary (in the case of the mode of estimation within a picture).
Data having undergone motion compensation at the motion compensator 87 is added with the output of the DCT block line replacer 88 at the calculator 85. This addition output is then outputted to the frame memory 33 shown in FIG. 4.
This addition output is, however, B picture data and is not stored in the frame memory 86 because this data cannot be used in estimation picture generation for other pictures.
After the B picture picture has been outputted, the P picture picture data stored in the backward estimation picture part 86b is read-out, supplied to the calculator 85 via the motion compensator 87 and outputted without modification to the frame memory 33 of FIG. 4.
At the decoder 31, although the circuit corresponding to the estimation mode switcher 52 of the encoder 17 of FIG. 5 is not shown in the drawings, the processes corresponding to this circuit i.e. the processes necessary for returning the divided configuration of odd field and even field line signals to the original mixed configuration are executed at the motion compensator 87 (carried out by the motion compensator 64 also at the encoder 17).
In the above, a description is given for the processing of a luminance signal but processing for a color difference signal could also be carried out in the same way. However, in this case, the motion vector used is reduced in the vertical and horizontal directions to half of that used for the luminance signal in the same way as the case for the encoder 17.
With related image signal encoding methods such as MPEG 2, by carrying out DCT transforms, information compression is carried out by allotting a large number of bits to signals having a large amount of electrical power (low frequency component) and signals having little electrical power (high frequency component) are allotted fewer bits.
However, the following problems occur in the related art because DCT transform processing is carried out on picture data divided into macroblocks of a fixed size.
(1) Block distortions. PA1 (2) "Mosquito" noise.
The block distortions (1) occur when the encoding bit rate (number of bits allotted to the quantization) is not sufficient and phenomena can be observed at the boundaries of neighboring macroblocks. This is caused by dividing blocks of picture signal without considering continuity of signal between blocks.
(2) Mosquito noise is deterioration occurring in the vicinity of the edge of the blocks. This is caused by reflection distortions due to frequency component loss when blocks including edges are DCT transformed and encoded. When these kind of reflection distortions occur with these macroblocks, the whole of the macroblock deteriorates. Further, the decoded picture appears unnatural because the deterioration has no correlation with the direction of the time axis.
The influence of this deterioration can be alleviated by carrying out transform processing with a short tap number base. This corresponds to narrowing the range of the dispersion of the aforementioned reflection distortions. However, when transforms of a short tap number base are performed, the efficiency of the electrical power convergence becomes poor and the encoding efficiency deteriorates due to these transforms. There are also methods where the base tap number is changed for each frequency component, but changing the tap number is difficult when DCT transforms are used.
Further, wavelet transforms also exist for as transforms used for concentrating the signal power. In wavelet transforms, one type of filter bank disassembles each of the frequency components using filters (DCT can also be explained using one type of filter bank). FIG. 10 shows an example configuration for a system for encoding/decoding pictures using wavelet transforms. This system comprises two types of filter bank, a band-dividing filter bank and a band synthesis filter bank. The band-dividing filter bank comprises a low-pass filter that is a digital filter, a high-pass filter, and a down-sampling circuit. The band synthesis filter bank comprises an up-sampling circuit, a low-pass filter that is a digital filter and a high-pass filter, together with a synthesis circuit.
At the band-dividing filter band, the picture is filtered by a low-pass filter and a high-pass filter and the picture is then divided into M frequency bands (bands) by thinning out these outputs at down-sampling circuit. Further, with band synthesis filtering, the picture for each frequency band is interpolated by an up-sampling circuit and filtered using a low-pass filter and a high-pass filter. Further, the filtering results are then synthesized at the synthesis circuit and the original signal is decoded again. This method where an image signal is encoded/decoded using these band-dividing filter banks and band composite filter banks are is referred to as sub-band encoding/decoding.
Usually, a number of band-dividing filter banks (this is the same for band synthesis filter banks) are combined together in a tree-shaped structure. Wavelet transforms can be realized using these kinds of tree-shaped filter banks. A method where low-band components are successively band divided using lower-level filter banks after band-dividing at upper level filter banks has taken place is referred to as octave dividing.
For example, as shown in FIG. 11, when the picture is band-divided into four bands at the uppermost level filter bank, if the output of this filter bank is referred to as layer 0, and, in the case in FIG. 11, four bands of LL, LH, HL and HH exist at layer 0. According to octave dividing, the low-band component of layer 0 is further divided into 4 by the filter bank, with the low-band component obtained the previous time then being divided by just the number of times thereafter. The output obtained in the nth division is then referred to as the layer (n-1).
There is also the possibility that problems that could not be resolved using DCT transforms can now be resolved using new transform methods such as waveform transforms. However, it is well known that deterioration different from that in the case of DCT transforms known as ringing occurs when wavelet transforms are used (although ringing also occurs for substantially the same reason as mosquito noise (loss of high band component). How the wavelet transforms are to be applied to the time axis direction is also, however, a problem that is yet to be resolved.
The problem of how to encode the band-divided data obtained using wavelet transforms in an efficient manner is also yet to be resolved.
As the present invention sets out to resolve these kinds of problems, it is the object of the present invention to ease detriments such as block distortion and mosquito noise that could not be prevented in related moving picture encoding methods, to improve the encoding rate and encode pictures at a lower bit-rate.