As an example of a conventional moving picture encoding system, a moving picture encoding device and a moving picture decoding device will be described based on an “H. 26L encoding system” described in “ITU-T SG16 VCEG-M81, H. 26L Test Model Long Term Number 8 (TML-8)”. FIG. 1 shows a configuration of the aforementioned moving picture encoding device 20, and FIG. 2 shows a configuration of the aforementioned moving picture decoding device 50.
The moving picture encoding device shown in FIG. 1 reduces a redundancy present in a time direction by motion compensation inter-frame prediction, and further reduces a redundancy left in a space direction by orthogonal transformation, so as to execute information compression of a moving picture (an input video signal). FIG. 3 shows an explanatory diagram of the motion compensation inter-frame prediction.
Hereinafter, an operation of the moving picture encoding device 20 shown in FIG. 1 will be described with reference to these drawings.
An input video signal 1 is constituted of a time sequence of frame pictures. Here, it is assumed that the frame picture to be encoded is divided into square rectangular areas (macro-blocks) of 16×16 pixels, and an encoding process in the moving picture encoding device 20 and a decoding process in the moving picture decoding device 50 are carried out by units of these macro-blocks. Additionally, the frame picture which is divided into the macro-block units is defined as “a frame picture signal 2”.
According to the “H. 26L encoding system”, what are available as “prediction modes” are an “INTRA prediction mode” for executing space prediction which uses pixel values of encoded neighboring areas on the same frame picture (e.g., pixel values adjacent to the upper and left sides of a frame picture signal 2 to be encoded), and a plurality of “INTER prediction modes” for executing motion compensation inter-frame prediction which uses encoded frame pictures (reference frame pictures 5) different with time.
The “H. 26L encoding system” is configured such that efficient information compression can be carried out by switching the “prediction mode” by a macro-block unit, in accordance with a local nature of the input video signal 1.
The “motion compensation inter-frame prediction” is a technology for searching an picture signal pattern similar to an picture signal pattern in the frame picture signal 2 within a predetermined search range of a reference frame picture 5, for detecting a spatial displacement amount between both picture signal patterns as a “motion vector 3”, and for encoding and transmitting “motion compensation related information” containing the “motion vector 3,” the “prediction mode” and a “reference frame number,” as well as a “predicted residual signal 9” calculated in accordance with the motion vector 3.
According to the “H. 26L encoding system”, as shown in FIG. 3, 7 kinds of “INTER prediction modes” are available. More exactly, in addition to these INTER prediction modes, available is a “skip mode” useful when a video is static, i.e., a prediction mode for directly copying a pixel in the same position of the reference frame picture 5 (the encoded frame picture) as it is.
As shown in FIG. 3, the motion vector 3 is detected by a unit of 16×16 pixels on a “mode 1”, by a unit of 8×16 pixels on a “mode 2”, by a unit of 16×8 pixels on a “mode 3”, by a unit of 8×8 pixels on a “mode 4”, by a unit of 4×8 pixels on a “mode 5”, by a unit of 8×4 pixels on a “mode 6”, and by a unit of 4×4 pixels on a “mode 7”.
That is, these 7 kinds of prediction modes enable subdivision of motion detection units in the macro-block, and are disposed for the purpose of accurately grasping various motions that can be present in the macro-block.
First, an input section 31 transmits the frame picture signal 2 to a motion detection section 32 and a space prediction section 35.
Subsequently, the motion detection section 32 detects the number of motion vectors 3 corresponding to a predetermined prediction mode 4 for the received frame picture signal 2, by referring to the reference frame picture 5 sent from a frame memory 34.
Meanwhile, the space prediction section 35 carries out space prediction that uses pixel values of encoded neighboring areas on the same frame picture sent from the frame memory 34. The space prediction section 35 may execute space prediction by a plurality of methods.
Second, the motion detection section 32 transmits motion vectors 3 detected for all the “INTER prediction modes” shown in FIG. 3, and the prediction modes (e.g., modes 1 to 7) 4 corresponding to the motion vectors 3, to a motion compensation section 33.
Subsequently, the motion compensation section 33 generates a predicted picture signal (a macro-block unit) 6, by motion compensation which uses the reference frame picture 5 sent from the frame memory 34 and a combination of the plurality of motion vectors 3 and prediction modes 4 sent from the motion detection section 32.
Third, the motion compensation section 33 transmits information regarding the predicted picture signal 6 generated by the motion compensation, the prediction mode 4, the motion vectors 3 and encoding efficiency, to a prediction mode determining section 36. On the other hand, the space prediction section 35 transmits information regarding a predicted picture signal 7 generated by space prediction, the prediction mode (if there are a plurality of kinds of space prediction) 4 and encoding efficiency, to the prediction mode determining section 36.
Fourth, the prediction mode determining section 36 evaluates all the “INTER prediction modes” shown in FIG. 3 by a macro-block unit, so as to select an “INTER prediction mode” which is determined to be highest in encoding efficiency.
Additionally, the prediction mode determining section 36 similarly evaluates the “INTRA prediction modes”, and selects the “INTRA prediction mode” if the “INTRA prediction mode” is higher in encoding efficiency than the “INTER prediction mode”.
Then, the prediction mode determining section 36 transmits a predicted picture signal (a macro-block unit) 8 generated by the selected prediction mode 4, to a subtracter 37.
Additionally, when the “INTER prediction mode” is selected as the prediction mode 4, the prediction mode determining section 36 transmits “motion compensation related information” containing the number (up to 16 per macro-block) of motion vectors 3 or the like set on the selected “INTER prediction mode”, to a variable length encoding section 40. On the other hand, when the “INTRA prediction mode” is selected as the prediction mode 4, the prediction mode determining section 36 transmits no motion vectors 3.
Fifth, an orthogonal transformation section 38 generates an orthogonal transformation coefficient 10, by applying orthogonal transformation to a difference value (a predicted residual signal 9) between the frame picture signal 2 and the predicted picture signal 8 sent from the subtracter 37.
Sixth, a quantization section 39 generates a quantized orthogonal transformation coefficient 11, by quantizing the orthogonal transformation coefficient 10 sent from the orthogonal transformation section 38.
Seventh, the variable length encoding section 40 carries out entropy encoding for the quantized orthogonal transformation coefficient 11 sent from the quantization section 39 and the prediction mode 4 (and motion vectors 3) sent from the prediction mode determining section 36, so as to multiplex them into a compressed stream 12.
The variable length encoding section 40 may transmit the compressed stream 12 to a moving picture decoding device 50 by a macro-block unit, or transmit the compressed stream 12 by a frame picture unit.
Additionally, an inverse quantization section 41 generates an orthogonal transformation coefficient 13, by carrying out inverse quantization for the quantized orthogonal transformation coefficient 11 sent from the quantization section 39. Then, an inverse orthogonal transformation section 42 generates a predicted residual signal 14, by carrying out inverse orthogonal transformation for the orthogonal transformation coefficient 13 sent from the inverse quantization section 41.
Next, at an adder 43, the predicted residual signal 14 sent from the inverse orthogonal transformation section 42 and the predicted picture signal 8 sent from the prediction mode determining section 36 are added together to generate a frame picture signal 15.
This frame picture signal 15 of a macro-block unit is stored in the frame memory 34. In the frame memory 34, there have been stored a reference frame picture 5 of a frame picture unit used for a subsequent encoding process, and information (a pixel value or a motion vector) of an encoded macro-block of a frame picture which is currently being encoded.
Next, an operation of the moving picture decoding device 10 shown in FIG. 2 will be described.
First, after reception of the compressed stream 12, a variable length decoding section 71 detects a synchronous word indicating a head of each frame, and restores the motion vector 3, the prediction mode 4 and the quantized orthogonal transformation coefficient 11 for each macro-block unit.
Then, the variable length decoding section 71 transmits the quantized orthogonal transformation coefficient 11 to an inverse quantization section 76, and transmits the prediction mode 4 to a switch 75.
Additionally, the variable length decoding section 71 transmits the motion vector 3 and the prediction mode 4 to a motion compensation section 72 when the prediction mode 4 is an “INTER prediction mode”, and transmits the prediction mode 4 to a space prediction section 74 when the prediction mode 4 is an “INTRA prediction mode”.
Next, when the prediction mode 4 is the “INTER prediction mode”, the motion compensation section 72 generates a predicted picture signal 6, by using the motion vector 3 and the prediction mode 4 sent from the variable length decoding section 71 and referring to a reference frame picture 5 sent from a frame memory 73.
On the other hand, when the prediction mode 4 is the “INTRA prediction mode”, the space prediction section 74 generates a predicted picture signal 7, by referring to an encoded picture signal of a neighboring area sent from the frame memory 73.
Next, the switch 75 chooses any one of the predicted picture signals 6 and 7, in accordance with the prediction mode 4 sent from the variable length decoding section 71, so sa to determine a predicted picture signal 8.
Meanwhile, the quantized orthogonal transformation coefficient 11 decoded by the variable length decoding section 71 is subjected to inverse quantization by the inverse quantization section 76, so as to be restored as an orthogonal transformation coefficient 10. And the orthogonal transformation coefficient 10 is subjected to inverse orthogonal transformation by an inverse orthogonal transformation section 77, so as to be restored as a predicted residual signal 9.
Then, at an adder 78, the predicted picture signal 8 sent from the switch 75 and the predicted residual signal 9 sent from the inverse orthogonal transformation section 77 are added together, and the frame picture signal 2 is thereby restored to be sent to an output section 80. The output section 80 outputs the signal to a display device (not shown) with predetermined timing, so as to reproduce an output video signal (a moving picture) 1A.
Additionally, the restored frame picture signal 2 is stored in the frame memory 73, so as to be used for a decoding process thereafter.
In the “TML-8”, motion compensation which uses a concept of a “funny position” is realized. FIG. 4 shows this “funny position” together with an integer picture position, a ½ picture position, and a ¼ picture position. Incidentally, in the “TML-8”, motion compensation of ¼ pixel accuracy is realized.
In FIG. 4, it is assumed that the motion vector 3 detected by the motion detection section 32 indicates an integer pixel position (the pixel position of (1 pixel, 1 pixel)) “D” in the reference frame picture 5 in relation to an integer pixel position “A” in the frame picture signal 2 to be encoded. In this case, a pixel value of the pixel position “D” in the reference frame picture 5 becomes a “motion compensation value” in relation to the pixel position “A” in the frame picture signal 2 to be encoded.
Next, it is assumed that the motion vector 3 indicates a ½ pixel position (the pixel position of (½ pixel, ½ pixel)) “E” in the reference frame picture 5 in relation to the integer pixel position “A” in the frame picture signal 2 to be encoded. In this case, an interpolation value obtained by independently operating 6 tap filters (1, −5, 20, 20, −5, 1)/32 vertically and horizontally for the pixel value of the integer pixel position in the reference frame picture 5 becomes a “motion compensation value” in relation to the pixel position “A” in the frame picture signal 2 to be encoded.
Next, it is assumed that the motion vector 3 indicates a ¼ pixel position (a pixel position of (¼ pixel, ¼ pixel)) “F” or “G” in the reference frame picture 5 in relation to the integer pixel position “A” in the frame picture signal 2 to be encoded. In this case, a linear interpolation value of a pixel value of a neighboring integer pixel position and a pixel value of a neighboring ½ pixel position 5 becomes a “motion compensation value” in relation to the pixel position “A” in the frame picture signal 2 to be encoded.
For example, when the motion vector 3 indicates the pixel position “F” in the reference frame picture 5 in relation to the pixel position “A” in the frame picture signal 2 to be encoded, an average of 4 points of the pixel value of the neighboring integer pixel position and the pixel values of the neighboring ½ pixel positions which surround the pixel position “F” becomes a “motion compensation value” in relation to the pixel position A in the frame picture signal 2 to be encoded.
Additionally, when the motion vector 3 indicates the pixel position “G” in the reference frame picture 5 in relation to the integer pixel position A in the frame picture signal 2 to be encoded, an average of 2 points of the pixel values of the ½ pixel positions which horizontally sandwich the pixel position “G” becomes a “motion compensation value” in relation to the pixel position A in the frame picture signal 2 to be encoded.
Further, when the motion vector indicates a pixel position of (N+¾ pixel, M+¾ pixel: N and M are given integers) in the reference frame picture 5 in relation to an integer pixel position in the frame picture signal 2 to be encoded, a “motion compensation value” in relation to the integer pixel position in the frame picture signal 2 to be encoded becomes an average of a pixel value of (N, M), a pixel value of (N, M+1), a pixel value of (N+1, M) and a pixel value of (N+1, M+1) in the reference frame picture 5. Here, (N+¾ pixel, M+¾ pixel: N and M are given integers) in the reference frame picture 5 is the aforementioned “funny position”.
For example, when the motion vector 3 indicates a pixel position “H” (i.e., a “funny position”) in the reference frame picture 5 in relation to the integer pixel position “A” in the frame picture signal 2 to be encoded, a “motion compensation value” in relation to the pixel position “A” in the frame picture signal 2 to be encoded is not a value calculated in the aforementioned case of the ¼ pixel position (e.g., the pixel position “F”), but a value obtained by calculation of (A+B+C+D)/4.
As described above, in the “H. 26L encoding system”, many “INTER prediction modes” are available to enable elaborate motion compensation. Additionally, motion compensation based on the integer pixel position, the ½ pixel position, the ¼ pixel position and the funny position are available. By the foregoing configuration, while a configuration for prediction is elaborated, a mechanism is introduced to prevent breakage of the predicted picture signal 8 even if a frame picture signal 2 whose prediction would not be fulfilled is inputted.
The calculation of ¼ picture accuracy is carried out by linear interpolation of the pixel values of the neighboring pixel positions. Thus, a low-pass type operation is provided in a frequency space, so as to generate a smoothed predicted picture signal 6.
Additionally, when motion compensation based on the funny position is used, a “motion compensation value” is calculated based on an average of pixel values of 4 neighboring integer pixel positions, so as to generate a further smoothed predicted picture signal. If Gaussian noise is superimposed on the predicted picture signal, the smoothing has an effect of reducing a prediction error when this noise component is large.
Thus, in the “H. 26L encoding system” defined by the “TML-8”, if noise is superimposed on the reference frame picture 5, or if many high-pass components are contained in the reference frame picture 5 and an error in prediction is flagrant, encoding efficiency is improved by using the calculation of ¼ pixel accuracy and the motion compensation based on the funny position.
However, the following problems conceivably occur in the conventional “H. 26L encoding system”.
First, when a pixel position in the frame picture signal 2 to be encoded has a motion vector which indicates a pixel position (N+¾ pixel, M+¾ pixel: N and M are given integers) equal to the “funny position,” a calculated “motion compensation value” is always subjected to strong smoothing, and especially it has been a problem that elaborate motion compensation is hindered at a high rate (a first problem).
That is, in the conventional “H. 26L encoding system”, the “funny position” is defined by an absolute value of the motion vector 3. Thus, as shown in FIG. 5, for example, when blocks A, B, C, D and E move in parallel on a right lower side (¾ pixel, ¾ pixel), smoothed motion compensation is carried out based on a motion vector MV=(MVx, MVy)=(¾, ¾). Alternatively, motion compensation is carried out by feeding a motion vector different from real motion based on a motion vector MV=(MVx, MVy)=(½, ¾) or (¾, 1). Here, MVx indicates an X element of the motion vector, and MVy indicates a Y element of the motion vector.
Specifically, as shown in FIG. 5, in the conventional “H. 26L encoding system”, when a block to be encoded is E, and a motion vector MV of the block E is (MVxE, MVyE), an area expressed by “MVxE%4=3” and “MVyE%4=3” always is a “funny position,” and a smoothed pixel value is chosen as a “motion compensation value” for the block E. Here, “%” is a quotient remainder calculation symbol, and a unit for expressing the motion vector MV is a ¼ pixel.
Thus, in the “H. 26L encoding system”, since the motion vector (¾, ¾) indicates a smoothed pixel value present in a real (½, ½) pixel position, it has been a problem that the expressing of a pixel value of a pixel position (N+¾ pixel, M+¾ pixel: N and M are given integers) equal to the “funny position” is hindered from being expressed.
Second, in the generating of a predicted picture signal by ¼ pixel accuracy, effects of elaboration of prediction and smoothing of prediction are respectively expected at a high rate and a low rate. However, with regard to the smoothing of prediction at the low rate, motion compensation of ¼ pixel accuracy is not necessary but realization of motion compensation of ½ pixel accuracy is sufficient. Consequently, it has been a problem that detection of the motion vector of ¼ pixel accuracy which occupies a half of a parameter space of the motion vector for smoothing prediction is redundant.
The present invention, therefore, has been made with the foregoing problems in mind, and an object of the invention is to express a predicted picture signal with lighter overheads, and to provide motion compensation of different degrees of pixel accuracy.