Field of the Invention
This invention relates to a motion picture encoding method and a motion picture encoding apparatus preferably used to record a motion picture signal in a recording medium such as an optical disk or magnetic tape and reproduce and display it or used for a video conference system, video telephone system, or broadcasting unit which transmits a motion picture signal from a transmitting side to a receiving side through a transmission line and receives and displays it at the receiving side.
For example, in the case of a system for transmitting a motion picture signal to a remote place such as a video conference system or video telephone system, the picture signal is compressed and encoded by using the line correlation or interframe correlation of the video signal in order to efficiently use the transmission line. FIG. 4 shows the structures of motion picture encoding and decoding apparatuses for encoding, transmitting, and decoding a motion picture signal. An encoding apparatus 1 encodes an input video signal VD and transmits the video signal VD to a recording medium 3 serving as a transmission line. The decoding apparatus 2 reproduces a signal recorded in the recording medium 3 and decodes and outputs it.
In the encoding apparatus 1, the input video signal VD is supplied to a preprocessing circuit 11 where the signal VD is divided into a luminance signal and a color signal (in this case, a chrominance signal) and these signals are analog-to-digital-converted by analog-to-digital (A/D) converters 12 and 13 respectively. The video signal converted into a digital signal by the analog-to-digital conversion by analog-to-digital (A/D) converters 12 and 13 is inputted to and filtered by a prefilter 19 and thereafter supplied to and stored in a frame memory 14. The frame memory 14 stores the luminance signal in a luminance-signal frame memory 15 and the chrominance signal in a chrominance-signal frame memory 16 respectively.
The prefilter 19 performs processing for improving the encoding efficiency and the picture quality. This is, for example, a filter for removing noises or limiting a bandwidth. A two-dimensional low-pass filter can be used as the prefilter 19. FIG. 11B shows 3.times.3 pixel blocks serving as inputs of the prefilter 19. In this case, the 3.times.3 pixel blocks around an object pixel "e" are used to obtain an output value of a filter corresponding to the pixel "e". FIG. 11A shows filter factors which are given to each of the pixels "a" to "i". Actually, in the prefilter 19, the following equation is performed: EQU 1/16.times.a+1/8.times.b+1/16.times.c+1/8.times.d+1/4.times.e +1/8.times.f+1/16.times.g+1/8.times.h+1/16.times.i (1)
and an output of this operation is assumed as an output value of a filter corresponding to the pixel "e". FIG. 10 shows the structure of two-dimensional low-pass filter which performs such operation. In FIG. 10, DL1 and DL2 are one-line delay, D1 to D6 are one pixel delay, AD1 to AD4 are adders, and BF1 to BF4 are multiplexers multiplying an input value by 1/2.
Practically, a filtered output value is outputted from an output OUT2 and an original pixel value not filtered is outputted from an output OUT1 after a predetermined delay. The filter always performs uniform filtering independently of an input video signal or the state of an encoder.
Returning to FIG. 4, a format conversion circuit 17 converts a video signal stored in the frame memory 14 to an input format of the encoder 18. Data converted to a predetermined format is supplied to the encoder 18 from the format conversion circuit 17 where the data is encoded. Though the encoding algorithm is optional, an example of the algorithm is described later in detail by referring to FIG. 6. The signal encoded by the encoder 18 is outputted to a transmission line as a bit stream and stored in, for example, the recording medium 3.
The data reproduced from the recording medium 3 is supplied to and decoded by a decoder 31 of the decoding apparatus 2. Though the decoding algorithm of the decoder 31 can be optional, it must be paired with an encoding algorithm. An example of the decoding algorithm is described later in detail by referring to FIG. 9. The data decoded by the decoder 31 is inputted to a format conversion circuit 32 and converted to an output format.
Then, a luminance signal with a frame format is supplied to and stored in a luminance-signal frame memory 34 of a frame memory 33 and a chrominance signal is supplied to and stored in a chrominance-signal frame memory 35. The luminance signal and the chrominance signal read out of the luminance-signal frame memory 34 and the chrominance-signal frame memory 35 are supplied to and filtered by a postfilter 39, thereafter digital-to-analog-converted by digital-to-analog (D/A) converters 36 and 37 respectively, and supplied to and synthesized by a postprocessing circuit 38. Then, they are outputted to and displayed on a not-illustrated display such as a CRT as output video signals.
The postfilter 39 performs the filtering for improving the picture quality. The postfilter 39 is used to moderate deterioration caused by encoding a picture. The postfilter 39 is a filter for removing, for example, a block distortion, a noise generated in the vicinity of a steep edge, or a quantized noise. Though the postfilter includes various types, it is possible to use a two-dimensional low-pass filter same as one used for the prefilter 19 as shown in FIG. 10.
Then, very efficient encoding of a motion picture is described below. Because motion picture data such as a video signal has very much information content, a recording medium with a very-high data transmission rate has been required so far in order to record and reproduce it for a long time. Therefore, a large magnetic tape or optical disc has been required. Moreover, to transmit motion picture data through a transmission line or to use the motion picture data for broadcast, there has been a problem that the data cannot directly be transmitted through an existing transmission line because data content is too much.
Therefore, to record video signals in a smaller recording medium for a long time, or to use them for broadcast, it is indispensable to use means for very efficiently encoding and recording the video signals and very efficiently decoding the read signals. To meet these requests, very-efficient encoding methods using the correlation between video signals are proposed. The MPEG (Moving Picture Experts Group) method is one of the above methods. This method is discussed in ISO-IEC/JTC1/SC2/WG11 and proposed as a standard plan, which is a hybrid method realized by combining the motion compensative prediction encoding with the discrete cosine transform (DCT) encoding.
The motion compensative prediction encoding is a method using the correlation between video signals in the time base direction, in which information content necessary for encoding is compressed by predicting a present inputted picture from an already-decoded and reproduced known signal and transmitting only the then prediction error. The DCT encoding makes it possible to compress information content by using the in-frame two-dimensional correlation between video signals, thereby concentrating signal power on a certain frequency component, and encoding only concentratedly distributed factors. For example, at a portion with a flat pattern and a high autocorrelativity between video signals, DCT factors are concentratedly distributed to a low frequency component. Therefore, in this case, information content can be compressed by encoding only the factors concentratedly distributed to a low frequency band. Though a case of using MPEG2 method for an encoder is described below in detail, the encoding method is not restricted to the MPEG2 method, it can also be applied to any encoding method.
The MPEG2 method is described below in detail. By using the line correlation, it is possible to compress a video signal by means of, for example, DCT. Moreover, by using the inter-frame correlation, it is possible to encode the video signal by further compressing it. For example, as shown in FIGS. 1A and 1B, when frame pictures PC1, PC2, and PC3 are generated at times t1, t2, and t3 respectively, a frame picture PC12 is generated by computing a difference of video signal between the frame pictures PC1 and PC2 and moreover, a frame picture PC23 is generated by computing a difference between the frame pictures PC2 and PC3. Because temporally-adjacent frame pictures do not normally have a very large change, a differential signal obtained by computing a difference between the frame pictures has a small value. Therefore, by encoding the differential signal, the number of codes can be compressed.
However, by transmitting only a differential signal, it is impossible to restore an original picture. Therefore, a video signal is compressed and encoded by processing each frame picture as any one of such three pictures as an I-picture, P-picture, and B-picture. That is, as shown in FIGS. 2A and 2B, video signals of seventeen frames from F1 to F17 are assumed as a group of pictures (GOP) to be used as one unit for processing. Then, the video signal of the first frame F1 is encoded as an I-picture, that of the second frame F2 is processed as a B-picture, and that of the third frame F3 is processed as a P-picture. The fourth frame and subsequent frames from F4 to F17 are alternately processed as a B-picture or P-picture.
Video signals of I-pictures for one frame are directly transmitted. In the case of video signals of P-pictures, however, a difference from the video signal of temporally preceding I- or P-picture is transmitted basically as shown in FIG. 2A. Moreover, as for the video signals of B-pictures, a difference is obtained from the average value of the preceding and following frames and encoded basically as shown in FIG. 2B.
FIGS. 3A and 3B show the theory of a method for thus encoding a motion picture signal. Because the first frame F1 is processed as an I-picture, it is directly transmitted to a transmission line as transmission data F1X (in-picture encoding). However, because the second frame F2 is processed as a B-picture, a difference from the average value of the temporally preceding frame F1 and the temporally following frame F3 is computed and the difference is transmitted as transmission data F2X.
The processing as a B-picture is further classified into four types. The first processing directly transmits the data for the original frame F2 as the transmission data F2X (SP1)(intra-encoding), which is the same processing as the case of an I-picture. The second processing computes a difference from the temporally following frame F3 and transmits the difference (SP2) (backward prediction encoding). The third processing transmits a difference from the temporally preceding frame F1 (SP3) (forward prediction encoding). The fourth processing generates a difference (SP4) from the average value of the temporally preceding frame F1 and the following frame F3 and transmits the difference as transmission data F2X (bidirectional prediction encoding).
Among these four methods, a method for minimizing transmission data content is appropriately adopted in macroblock unit. When transmitting differential data, motion vector x1 inbetween with the picture (prediction picture) of a frame for which a difference is computed (motion vector between the frames F1 and F2) (in the case of forward prediction), motion vector x2 (motion vector between the frames F3 and F2) (in the case of backward prediction), or both motion vectors x1 and x2 (in the case of bidirectional prediction) are transmitted together with differential data.
As for the frame F3 of P-picture, transmitting the temporally preceding frame F1 as a prediction picture a differential signal (SP3) with the frame F3 and the motion vector x3 are computed and transmitted as transmission data F3X (forward prediction encoding). Or, the data for the original frame F3 is directly transmitted as transmission data F3X (SP1) (intra-encoding). Out of these two methods, a method for minimizing transmission data is appropriately selected in macroblock unit similarly to the case of B-picture.
Then, the structure of the encoder 18 is described below by referring to FIG. 6. Picture data BD to be encoded is inputted to a motion vector detection circuit (MV-Det) 50 in macroblocks. The motion vector detection circuit 50 processes the picture data for each frame as an I-picture, P-picture, or B-picture in accordance with a preset sequence. It is predetermined that the picture for each sequentially inputted frame is processed as any one of I-, P-, and B-pictures (as shown in FIGS. 2A and 2B, for example, the group of pictures comprising the frames F1 to F17 are processed as I, B, P, B, P . . . , B, P).
The picture data for a frame to be processed as an I-picture (e.g. frame F1) is transferred from the motion vector detection circuit 50 to and stored in a forward original picture section 51a of a frame memory 51, the picture data for a frame to be processed as a B-picture (e.g. frame F2) is transferred to and stored in an original picture section 51b, and the picture data for a frame to be processed as a P-picture (e.g. frame F3) is transferred to and stored in a backward original picture section 51c.
Moreover, when the picture of a frame to be processed as a B-picture (frame F4) or P-picture (Frame F5) is inputted at the next timing, the picture data for the first P-picture (frame F3) having been stored in the backward original picture section 51c so far is transferred to the forward original picture section 51a, the picture data for the next B-picture (frame F4) is stored (overwritten) in the original picture section 51b, and the picture data for the next P-picture (frame F5) is stored (overwritten) in the backward original picture section 51c. These operations are sequentially repeated.
The signal of each picture stored in the frame memory 51 is read out of the memory 51, to which frame prediction mode processing or field prediction mode processing is applied by a prediction mode switching circuit (Mode-SW) 52. Moreover, in-picture prediction, forward prediction, backward prediction, or bidirectional prediction is performed by an arithmetic section 53 under control of a prediction and decision circuit 54. Among these processing, a processing to be performed is determined correspondingly to a prediction error signal (difference between a reference picture to be processed and a prediction picture corresponding to the reference picture). Therefore, the motion vector detection circuit 50 generates the sum (square sum can also be used) of absolute values of prediction error signals used for the above decision.
The frame prediction mode and field prediction mode of the prediction mode switching circuit 52 are described below. When the frame prediction mode is set, the prediction mode switching circuit 52 directly outputs four luminance blocks Y[1] to Y[4] supplied from the motion vector detection circuit 50 to the rear-stage arithmetic section 53. That is, in this case, line data in an odd field and line data in an even field are mixed in each luminance block as shown in FIG. 7A. In this frame prediction mode, prediction is performed every four luminance blocks (macroblocks) and one motion vector corresponds to four luminance blocks.
However, the prediction mode switching circuit 52 outputs a signal with the structure shown in FIG. 7A supplied from the motion vector detection circuit 50 to the arithmetic section 53 in the field prediction mode by constituting luminance blocks Y[1] and Y[2] among four luminance blocks only with dots of lines of odd fields and two other luminance blocks Y[3] and Y[4] with data for lines of even fields as shown in FIG. 7B. In this case, one motion vector corresponds to two luminance blocks Y[1] and Y[2] and the other motion block corresponds to two other luminance blocks Y[3] and Y[4].
The motion vector detection circuit 50 outputs the sum of absolute values of prediction errors in the frame prediction mode and that of absolute values of prediction errors in the field prediction mode to the prediction mode switching circuit 52. The prediction mode switching circuit 52 compares the sum of absolute values of prediction errors in the frame prediction mode with that in the field prediction mode and processes a prediction mode with smaller sum to output data to the arithmetic section 53. However, the processing is actually performed by the motion vector detection circuit 50. That is, the motion vector detection circuit 50 outputs a signal with a structure corresponding to the determined mode to the prediction mode switching circuit 52 and the prediction mode switching circuit 52 directly outputs the signal to the rear-stage arithmetic section 53.
In the frame prediction mode, as shown in FIG. 7A, a chrominance signal is supplied to the arithmetic section 53 under the state in which data for lines of odd fields is mixed with data for lines of even fields. In the field prediction mode, as shown in FIG. 7B, each upper half (4 lines) of chrominance blocks Cb and Cr is used for a chrominance signal of odd fields corresponding to the luminance blocks Y[1] and Y[2] and each lower half (4 lines) of them is used for a chrominance signal of even fields corresponding to the luminance blocks Y[3] and Y[4].
Moreover, the motion vector detection circuit 50 generates the sum of absolute values of prediction errors for determining by the prediction and decision circuit 54 whether to perform any one of in-picture prediction, forward prediction, backward prediction, and bidirectional prediction as follows. That is, the circuit 50 obtains the difference between the absolute value .vertline..SIGMA.Aij.vertline. of the sum .SIGMA.Aij of signal Aij of macroblocks of a reference picture and the sum .SIGMA..vertline.Aij.vertline. of the absolute values .vertline.Aij.vertline. of signals Aij of macroblocks as the sum of absolute values of prediction errors of in-picture prediction. Moreover, the circuit 50 obtains the sum .SIGMA..vertline.Aij-Bij.vertline. of the absolute values .vertline.Aij-Bij.vertline. of the differences Aij-Bij between the signals Aij of macroblocks of a reference picture and the signals Bij of macroblocks of a prediction picture as the sum of absolute values of prediction errors of forward prediction.
The sum of absolute values of prediction errors in the backward prediction and that in the bidirectional prediction are obtained similarly to the case of the forward prediction (that is, by changing the prediction picture to a prediction picture different from the case of the forward prediction). These sums of absolute values are supplied to the prediction and decision circuit 54. The prediction and decision circuit 54 selects the smallest sum of absolute values out of the sums of prediction errors in the forward prediction, backward prediction, and bidirectional prediction as the sum of absolute values of prediction errors in the inter-prediction. Moreover, the circuit 54 compares the sum of absolute values of prediction errors in the inter-prediction With that in the in-picture prediction, selects the smaller sum of absolute values and moreover selects a mode corresponding to the selected sum of absolute values as a prediction mode (P-mode). That is, if the sum of absolute values of prediction errors in the in-picture prediction is smaller, in-picture prediction mode is set. If the sum of absolute values of prediction errors in the inter-prediction is smaller, a mode with the smallest corresponding sum of absolute values is set among the forward, backward, and bidirectional prediction modes.
Thus, the motion vector detection circuit 50 has a structure corresponding to a mode selected by the prediction mode switching circuit 52 out of the frame and field prediction modes, supplies signals of macroblocks of a reference picture to the arithmetic section 53 through the prediction mode switching circuit 52, and also detects a motion vector between a prediction picture corresponding to a prediction mode (P-mode) selected by the prediction and decision circuit 54 among four prediction modes and a reference picture and outputs the motion vector to a variable-length encoding circuit (VLC) 58 and a motion compensation circuit (M-comp) 64. As described above, a motion vector with the minimum corresponding sum of absolute values of prediction errors is selected as the above motion vector.
When the motion vector detection circuit 50 reads picture data for an I-picture from the forward original picture section 51a, the prediction and decision circuit 54 sets an in-frame (in-picture) prediction mode (mode not performing motion compensation) as a prediction mode and switches the contact of a switch 53d of the arithmetic section 53 to "a". Thereby, the picture data for an I-picture is inputted to a DCT mode switching circuit (DCT CTL) 55. The DCT mode switching circuit 55, as shown in FIG. 8A or 8B, sets the data for four luminance blocks to the state in which lines of odd fields and lines of even fields are mixed (frame DCT mode) or the state in which they are separated from each other (field DCT mode) and outputs the data to a DCT circuit 56.
That is, the DCT mode switching circuit 55 compares the encoding efficiency when performing DCT processing by mixing the data in odd fields with the data in even fields with the encoding efficiency when performing DCT processing by separating the former and the latter from each other, and selects a mode with a high encoding efficiency. For example, as shown in FIG. 8A, the circuit 55 forms an inputted signal into a structure in which lines of odd fields are mixed with those of even fields, computes differences between signals of lines of odd fields and those of lines of even fields which are vertically adjacent each other, and moreover obtains the sum (or square sum) of the absolute values.
Moreover, as shown in FIG. 8B, the circuit 55 forms an inputted signal into a structure in which lines of odd fields are separated from those of even fields, computes differences between signals of lines of vertically-adjacent odd fields and differences between signals of lines of vertically-adjacent even fields, and obtains each sum of absolute values of the differences (or square sum). Moreover, the circuit 55 compares both sums (sums of absolute values) and sets a DCT mode corresponding to the smaller sum. That is, the circuit 55 sets the frame DCT mode if the former is smaller and the field DCT mode if the latter is smaller. Furthermore, the circuit 55 outputs data with a structure corresponding to the selected DCT mode to the DCT circuit 56 and also outputs a DCT flag (DCT-FLG) showing the selected DCT mode to the variable-length encoding circuit 58 and the motion compensation circuit 64 or a block rearrangement circuit 65.
As clarified by comparing the prediction mode (FIGS. 7A and 7B) of the prediction mode switching circuit 52 with the DCT mode (FIGS. 8A and 8B) of the DCT mode switching circuit 55, the data structures in both modes are practically the same for the luminance block. When the frame prediction mode (mode in which an odd line and an even line are mixed) is selected in the prediction mode switching circuit 52, the frame DCT mode (mode in which an odd line and an even line are mixed) may also easily be selected in the DCT mode switching circuit 55. Moreover, when the field prediction mode (mode in which data in an odd field is separated from data in an even field is selected in the prediction mode switching circuit 52 the field DCT mode (mode in which data in an odd field is separated from data in an even field) may also easily be selected in the DCT mode switching circuit 55.
However, the above selections are not always made. A mode is determined in the prediction mode switching circuit 52 so that the sum of absolute values of prediction errors decreases and a mode is determined in the DCT mode switching circuit 55 so that the encoding efficiency is improved. The picture data for an I-picture outputted from the DCT mode switching circuit 55 is inputted to the DCT circuit 56 where the data is provided with DCT processing, and converted to a DCT factor. The DCT factor is inputted to a quantization circuit (Q) 57 and quantized by a quantization scale (QS) corresponding to the stored data content (quantization control signal (B-full)) of a transmission buffer (Buffer) 59, and thereafter inputted to the variable-length encoding circuit 58.
The variable-length encoding circuit 58 converts picture data (in this case, data for an I-picture) corresponding to quantization scale (QS) supplied from the quantization circuit 57 to a variable-length code such as a Huffman code and outputs the code to the transmission buffer 59. Moreover, the variable-length encoding circuit 58 receives a quantization scale (QS) from the quantization circuit 57, a prediction mode (mode (P-mode) showing which one is set out of in-picture prediction, forward prediction, backward prediction, and bidirectional prediction) from the prediction and decision circuit 54, a motion vector (MV) from the motion vector detection circuit 50, a prediction flag (flag (P-FLG) showing which one is set out of frame prediction mode and field prediction mode) from the prediction mode switching circuit 52, and a DCT flag (flag (DCT-FLG) showing which one is set out of frame DCT mode and field DCT mode) outputted from the DCT mode switching circuit 55. These inputs are also variable-length-encoded.
The transmission buffer 59 temporarily stores inputted data and outputs data corresponding to the stored data content to the quantization circuit 57. When the remaining data content increases up to an allowable upper limit, the transmission buffer 59 decreases the quantized data content by increasing a quantization scale (QS) of the quantization circuit 57 in accordance with a quantization control signal (B-full). However, when the remaining data content decreases down to an allowable lower limit, the transmission buffer 59 increases the quantized data content by decreasing the quantization scale (QS) of the quantization circuit 57 in accordance with the quantization control signal (B-full). Thus, the overflow or underflow of the transmission buffer 59 is prevented. Then, the data stored in the transmission buffer 59 is read at a predetermined timing, outputted to a transmission line, and recorded in, for example, the recording medium 3.
The data for an I-picture outputted from the quantization circuit 57 is inputted to an inverse quantization circuit (IQ) 60 and inversely quantized correspondingly to the quantization scale (QS) supplied from the quantization circuit 57. An output of the inverse quantization circuit 60 is inputted to an inverse-DCT (IDCT) circuit 61 and provided with inverse DCT processing, and thereafter blocks are rearranged for each DCT mode (frame/field) by the block rearrangement circuit (Block Change) 65. An output of the block rearrangement circuit 65 is supplied to and stored in a forward prediction picture section (F-P) 63a of a frame memory 63 through a computing unit 62.
When the motion vector detection circuit 50 processes picture data for each sequentially-inputted frame as I-, B-, P-, B-, P-, . . . , or B-picture, it processes the picture data for the first inputted frame as an I-picture and thereafter processes the picture data for the third inputted frame as a P-picture before processing the picture of the second inputted frame as a B-picture. This is because decoding cannot be made unless the P-picture serving as a backward prediction picture is already prepared because the B-picture is followed by backward prediction.
Therefore, the motion vector detection circuit 50 processes the I-picture and thereafter starts processing the picture data for the P-picture stored in the backward original picture section 51c. Then, similarly to the case above described, the sum of absolute values of inter-frame differences (prediction errors) for each macroblock is supplied to the prediction mode switching circuit 52 and the prediction and decision circuit 54 from the motion vector detection circuit 50. The prediction mode switching circuit 52 and the prediction and decision circuit 54 set the frame/field prediction mode or the in-picture-prediction, forward-prediction, backward-prediction, or bidirectional-prediction mode correspondingly to the sum of absolute values of prediction errors of macroblocks of the P-picture.
When the in-frame prediction mode is set, the arithmetic section 53 switches the contact of the switch 53d to "a" as described above. Therefore, the data is transmitted to a transmission line through the DCT mode switching circuit 55, DCT circuit 56, quantization circuit 57, variable-length encoding circuit 58, and transmission buffer 59 the same as the data for the I-picture is. Moreover, the data is supplied to and stored in a backward prediction picture section (B-P) 63b of the frame memory 63 through the inverse quantization circuit 60, inverse DCT circuit 61, block rearrangement circuit 65, and computing unit 62.
In the forward prediction mode, the contact of the switch 53d is switched to "b", and the data for a picture (in this case, an I-picture) is read from the forward prediction picture section 63a of the frame memory 63 and motion-compensated by the motion compensation circuit 64 correspondingly to a motion vector outputted from the motion vector detection circuit 50. That is, when setting of the forward prediction mode is commanded by the prediction and decision circuit 54, the motion compensation circuit 64 reads the data to generate prediction picture data by shifting a read address of the forward prediction picture section 63a by a distance corresponding to the motion vector from a position corresponding to the position of a macroblock currently outputted by the motion vector detection circuit 50.
The prediction picture data outputted from the motion compensation circuit 64 is supplied to the computing unit 53a. The computing unit 53a subtracts prediction picture data corresponding to the macroblock supplied from the motion compensation circuit 64 from the data for the macroblock of a reference picture supplied from the prediction mode switching circuit 52 and outputs the difference (prediction error). The differential data is transmitted to a transmission line through the DCT mode switching circuit 55, DCT circuit 56, quantization circuit 57, variable-length encoding circuit 58, and transmission buffer 59. Moreover, the differential data is locally decoded by the inverse quantization circuit 60 and inverse DCT circuit 61 and inputted to the computing unit 62 through the block rearrangement circuit 65.
The same data as the prediction picture data supplied to the computing unit 53a is also supplied to the computing unit 62. The computing unit 62 adds the prediction picture data outputted by the motion compensation circuit 64 to the differential data outputted by the inverse DCT circuit 61. Thereby, the picture data for the original (decoded) P-picture is obtained. The picture data for the P-picture is supplied to and stored in the backward prediction picture section 63b of the frame memory 63.
After the data for the I-picture and the data for the P-picture are stored in the forward prediction picture section 63a and the backward prediction picture section 63b respectively, the motion vector detection circuit 50 processes the B-picture. The prediction mode switching circuit 52 and the prediction and decision circuit 54 set the frame/field mode correspondingly to the sum of absolute values of inter-frame differences for each macroblock and moreover set the prediction mode to any one of the in-frame prediction mode, forward prediction mode, backward prediction mode, and bidirectional prediction mode. As described above, in the in-frame prediction mode or forward prediction mode, the contact of the switch 53d is switched to "a" or "b". In this case, processing same as that for the P-picture is performed and data is transmitted.
However, when the backward prediction mode or bidirectional prediction mode is set, the contact of the switch 53d is switched to "c" or "d". In the backward prediction mode in which the contact of the switch 53d is switched to "c", the data for a picture (in this case, a P-picture) is read from the backward prediction picture section 63b and motion-compensated by the motion compensation circuit 64 correspondingly to a motion vector outputted from the motion vector detection circuit 50. That is, when setting of the backward prediction mode is commanded by the prediction and decision circuit 54, the motion compensation circuit 64 reads the data to generate prediction picture data by shifting a read address of the backward prediction picture section 63b by a distance corresponding to the motion vector from a position corresponding to the position of a macroblock currently outputted by the motion vector detection circuit 50.
The prediction picture data outputted from the motion compensation circuit 64 is supplied to the computing unit 53b. The computing unit 53b subtracts prediction picture data supplied from the motion compensation circuit 64 from the data for the macroblock of a reference picture supplied from the prediction mode switching circuit 52 and outputs the difference. The differential data is transmitted to a transmission line through the DCT mode switching circuit 55, DCT circuit 56, quantization circuit 57, variable-length encoding circuit 58, and transmission buffer 59.
In the bidirectional prediction mode in which the contact of the switch 53d is switched to "d", the data for a picture (in this case, an I-picture) is read from the forward prediction picture section 63a and the data for a picture (in this case, a P-picture) is read from the backward prediction picture section 63b, and both data values are motion-compensated by the motion compensation circuit 64 correspondingly to a motion vector outputted from the motor vector detection circuit 50. That is, when setting of the bidirectional prediction mode is commanded by the prediction and decision circuit 54, the motion compensation circuit 64 reads the data to generate prediction picture data by shifting read addresses of the forward prediction picture section 63a and backward prediction picture section 63b by a distance corresponding to motion vectors (two motion vectors--one for a forward prediction picture and the other for a backward prediction picture) from a position corresponding to the position of the macroblock currently outputted from the motion vector detection circuit 50.
The prediction picture data outputted from the motion compensation circuit 64 is supplied to a computing unit 53c. The computing unit 53c subtracts the average value of the prediction picture data supplied by the motion compensation circuit 64 from the data for the macroblock of an reference picture supplied by the motion vector detection circuit 50 and outputs the difference. The differential data is transmitted to a transmission line through the DCT mode switching circuit 55, DCT circuit 56, quantization circuit 57, variable-length encoding circuit 58, and transmission buffer 59. The picture of a B-picture is not stored in the frame memory 63 because it is not used for a prediction picture for other pictures.
In the frame memory 63, the forward prediction picture section 63a and the backward prediction picture section 63b are bank-switched according to necessity and a predetermined reference picture stored in one section or the other can be outputted as a forward prediction picture or a backward prediction picture by switching the pictures. The above description is stressed on a luminance block. However, a chrominance block is also processed and transmitted for each of the macroblocks shown in FIGS. 7A, 7B, 8A and 8B. Motion vectors for processing the chrominance block use motion vectors obtained by halving the motion vector of a corresponding luminance block in the vertical and horizontal directions respectively.
FIG. 9 shows the structure of the decoder 31 in FIG. 4. The picture data transmitted and encoded through a transmission line (recording medium 3) is received and reproduced by a not-illustrated receiving circuit and a replaying unit and temporarily stored in a receiving buffer (Buffer) 81, and thereafter supplied to a variable-length decoding circuit (IVLC) 82 of a decoding circuit 90. The variable-length decoding circuit 82 variable-length-decodes the data supplied from the receiving buffer 81 and supplies the motion vector (MV), prediction mode (P-mode), and prediction flag (P-FLG) to a motion compensation circuit (M-comp) 87. Moreover, the circuit 82 outputs the DCT flag (DCT-FLG) to an inverse block rearrangement circuit (Block Change) 88, and the quantization scale (QS) to an inverse quantization circuit (IQ) 83 respectively, and outputs the decoded picture data to the inverse quantization circuit 83.
The inverse quantization circuit 83 inversely quantizes the picture data supplied from the variable-length decoding circuit 82 in accordance with the quantization scale (QS) similarly supplied from the variable-length decoding circuit 82 and outputs the inversely-quantized picture data to the inverse DCT circuit (IDCT) 84. The data (DCT factor) outputted from the inverse quantization circuit 83 is inverse-DCT-processed by the inverse DCT circuit 84 and supplied to the computing unit 85 through the block rearrangement circuit 88. When the picture data supplied from the inverse DCT circuit 84 is the data for an I-picture, the data is outputted from the computing unit 85 and supplied to and stored in a forward prediction picture section (F-P) 86a of a frame memory 86 in order to generate the prediction picture data for picture data (data for a P- or B-picture) to be later inputted to the computing unit 85. Moreover, the data is outputted to the format conversion circuit 32 (FIG. 4).
When the picture data supplied from the inverse DCT circuit 84 is the data for a P-picture using the picture data one frame before the picture data supplied from the circuit 84 as prediction picture data and the data for the forward prediction mode, the picture data (data for an I-picture) one frame before is read from the forward prediction picture section 86a of the frame memory 86 and motion-compensated by the motion compensation circuit 87 correspondingly to a motion vector outputted from the variable-length decoding circuit 82. Then, the picture data is added with the picture data (differential data) supplied from the inverse DCT circuit 84 in the computing unit 85 and outputted from the unit 85. The added data, that is, the decoded P-picture data is supplied to and stored in a backward prediction picture section (B-P) 86b of the frame memory 86 in order to generate prediction picture data for the picture data (data for a B- or P-picture) to be later inputted to the computing unit 85.
The data in the in-picture prediction mode, even if it is the data for a P-picture, is directly stored in the backward prediction picture section 86b without being processed by the computing unit 85 similarly to the data for an I-picture. Because the P-picture is a picture to be displayed after the next B-picture, it is not outputted to the format conversion circuit 32 at this point of time (as described above, the P-picture inputted after the B-picture is processed before the B-picture and transmitted).
When the picture data supplied from the inverse DCT circuit 84 is the data for a B-picture, corresponding to a prediction mode supplied from the variable-length decoding circuit 82, the picture data for an I-picture (in the case of the forward prediction mode), the picture data for a P-picture (in the case of the backward prediction mode), or the both picture data (in the case of the bidirectional prediction mode) are read from the forward prediction picture section 86a and/or the backward prediction section 86b of the frame memory 86, motion-compensated by the motion compensation circuit 87 correspondingly to a motion vector outputted from the variable-length decoding circuit 82, and a prediction picture is generated. However, unless motion compensation is necessary (in the case of the in-picture prediction mode), no prediction picture is generated.
Thus, the data motion-compensated by the motion compensation circuit 87 is added with an output of the inverse DCT circuit 84 in the computing unit 85. The added output is supplied to the format conversion circuit 32. However, the added output is not stored in the frame memory 86 because it is the data for a B-picture and it is not used to generate a prediction picture for other pictures. After the picture of the B-picture is outputted, the picture data for a P-picture is read from the backward prediction picture section 86b and supplied to the computing unit 85 through the motion compensation circuit 87. In this case, however, motion compensation is not performed.
For the decoder 31, the prediction mode switching circuit 52 of the encoder 18 in FIG. 6 and a circuit corresponding to the DCT mode switching circuit 55 are not illustrated. However, the motion compensation circuit 87 executes the processing corresponding to these circuits, that is, the processing for returning the structure in which a signal of a line in an odd field is separated from a signal of a line in an even field to the original structure in which they are mixed according to necessity. Though the processing of a luminance signal is described above, the processing of a chrominance signal is similarly performed. In this case, however, motion vectors use those obtained by halving a motion vector for a luminance signal in the vertical and horizontal directions respectively.
In the above-described motion picture encoding apparatus 1 shown in FIG. 4, the prefilter 19 is used to remove noises from an input video signal, improve the encoding efficiency of the encoding apparatus 1, and decrease information content down to a predetermined value. Though the prefilter 19 includes various types with various characteristics, which is the optimum filter depends on the characteristic of an input video signal, the state of a motion picture encoding apparatus for encoding, a recording medium, or a transmission line.
For example, when an encoding bit rate is low, a decoded picture is extremely deteriorated and therefore, it is necessary to strongly apply a filter for improving the encoding efficiency in order to compensate the deterioration of the picture. However, when the encoding bit rate is high, the decoded picture maintains a high picture quality. In this case, if the same filter as that used for the low bit rate is used, the picture quality is deteriorated. Therefore, the existing motion picture filtering and encoding methods have a problem that a prefilter cannot always use the optimum filter.