The present invention relates to a video signal encoding method and system, and in particular to a video signal encoding method and system with motion compensated prediction.
A high-efficiency encoding system for use in encoding video signals employs a hybrid encoding system combining inter-picture prediction encoding utilizing motion compensation and intra-picture encoding.
FIG. 1 is a block diagram showing an encoding system utilizing a conventional hybrid encoding method described in ISO-IEC/JTC/SC29/WG11 MPEG 92/N0245 Test Model 2. As illustrated, a digital video signal 101 received at an input terminal 1 is supplied to a first input of a subtractor 10, a first input of a motion compensated prediction circuit 17, and a second input of a quantizer 12. The output of the subtractor 10 is supplied to a DCT (discrete cosine transform) circuit 11, and its output is supplied to a first input of the quantizer 12. The output 102 of the quantizer 12 is supplied to a first input of a variable-length encoder 19, and to an inverse quantizer 13, and its output is supplied to an IDCT (inverse discrete cosine transform) circuit 14, and its output is supplied to a first input of an adder 15. The output of the adder 15 is supplied to a memory 16, and data (reference image signal 103) read from the memory 16 is supplied to a second input of the motion compensated prediction circuit 17 and a first input of a selector 18. A first output 104 of the motion compensated prediction circuit 17 is supplied to the memory 16.
A zero signal (data representing a value "0") is supplied to a second input of the selector 18, and a second output 107 of the motion compensated prediction circuit 17 is supplied to a third input of the selector 18. The output 106 of the selector 18 is supplied to a second input of the subtractor 10 and a second input of the adder 15. A third output 107 of the motion compensated prediction circuit 17 is supplied to a second input of the variable-length encoder 19. The output of the variable-length encoder 19 is input to a transmitting buffer 20, and a first output of the transmitting buffer 20 is output via an output terminal 2. A second output 108 of the transmitting buffer 20 is supplied to a third input of the quantizer 12.
FIG. 2 is a block diagram showing an example of configuration of a conventional motion compensated prediction circuit 17. The digital video signal 101 is supplied to a first input of a motion vector search circuit 3a. A reference image signal 103 input from the memory 16 is supplied to a second input of the motion vector search circuit 3a. The motion vector 109 output from the motion vector search circuit 3a is supplied to a first input of a selector 4a. A zero vector ("0") is supplied to a second input of the selector 4a.
The prediction image 110 output from the motion vector search circuit 3a is supplied to a first input of a distortion calculator 5a. Applied to a second input of the distortion calculator 5a is the video signal 101 from the input terminal 1. A distortion output 111 from the distortion calculator 5a is supplied to a first input of a comparing and selecting circuit 7a.
The video signal 101 is also supplied to a first input of a distortion calculator 5b. The reference image signal 103 is also supplied to a second input of the distortion calculator 5b. A distortion output 112 from the distortion calculator 5b is supplied to a second input of the comparing and selecting circuit 7a. A selection mode 113 output from the comparing and selecting circuit 7a is supplied to a first input of a comparing and selecting circuit 7b. A distortion output 114 from the comparing and selecting circuit 7a is supplied to a second input of the comparing and selecting circuit 7b.
The selection mode output 113 from the comparing and selecting circuit 7a is also supplied to a third input of the selector 4a. A motion vector 107 output from the selector 4a is supplied to the variable-length encoder 19.
The prediction image 110 output from the motion vector search circuit 3a is supplied to a first input of a selector 4b. The reference image signal 103 from the input terminal 1a is also supplied to a second input of the selector 4b. The selection mode 113 from the comparing and selecting circuit 7a is supplied to a third input of the selector 4b.
The prediction image 104 from the selector 4b is supplied to the memory 16. The video signal 101 from the input terminal 1 is also input to a variance calculator 9.
An output 115 of the variance calculator 9 is supplied to a third input of the comparing and selecting circuit 7b. The selection mode 105 from the comparing and selecting circuit 7b is supplied to the selector 18.
The operation is described next. The digital input signal 101 is supplied to the subtractor 10, where a difference between the input picture (frame or field) and the picture from the motion compensated prediction circuit 17 is taken to reduce the temporal redundancy (redundancy in the direction of the time axis), and DCT is performed in the directions of the spatial axes. Coefficient obtained are quantized, and variable-length encoded, and then transmitted via the transmitting buffer 20.
Motion compensated prediction is schematically illustrated in FIG. 3. The picture that is to be encoded is divided into matching blocks each consisting of 16 pixels by 16 lines. For each matching block, examination is made as to which part of the reference picture, if used as a prediction image, minimizes the distortion. For instance, in the case of a still picture, if the 16 pixels by 16 lines at the same position as the matching block are used as the prediction image, the distortion will be zero. In the case of a motion picture, it may be that the block shifted leftward by 8 pixels and downward by 17 lines for instance yields the minimum distortion. Then, this block at the shifted position is regarded as a block corresponding to the matching block in question, and used as the prediction image, and (-8, 17) is transmitted as the motion vector.
Further explanation of the motion compensated prediction is explained with reference to FIG. 2. First, in the motion vector search circuit 3a, the motion vector is determined on the basis of the input image 101 and the reference image 103. This is effected by finding a block in the reference picture which minimizes the distortion for each matching block, as explained in connection with FIG. 3, and the the block thus found to give the minimum distortion is used as the prediction image, and the position of the block thus found to give the minimum distortion relative to the matching block is used as the motion vector. The distortion may be defined in terms of the sum of the absolute values of the differences.
In the distortion calculator 5a, the distortion defined as the sum of the squares of the differences between the input image 101 and the prediction image 110 output from the motion vector search circuit 3a is calculated for each matching block. The distortion 111 is also denoted by SEmc. In the distortion calculator 5b, the distortion defined as the sum of the squares of the differences between the input image 101 and the reference image 103 (of the same position) is calculated for each matching block. This distortion 112 is also denoted by SEnomc. The SEnomc is a particular value of the distortion SErc where the vector representing the relative position between the input image 101 and the prediction image is zero.
For the purpose of the following explanation, it is assumed that the whole picture consists of I pixels by J lines, and the input picture is represented by F(i,j) where i represents the pixel number in the horizontal direction and 0.ltoreq.i&lt;I, and j represents the pixel number in the vertical direction and 0.ltoreq.j&lt;J. The matching blocks are so defined as not t o overlap each other. Then, each matching block is represented by F(n*16+i, m*16+j) where 0.ltoreq.i.ltoreq.15, and 0.ltoreq.j.ltoreq.15, and (n, m) represent the position of the matching block ((n*16, m*16) represents the left, upper corner of the matching block). The (n, m)-th matching block is denoted by: EQU M(i,j)=F(n*16+i, m*16+j) (0.ltoreq.i.ltoreq.15, 0.ltoreq.j.ltoreq.15)(F1)
The reference image is represented by G(i, j) (0.ltoreq.i&lt;I, 0.ltoreq.j&lt;J), and the vector between the input image and the reference image is represented by (H, V), the prediction image PH, V(i, J) is given by: EQU PH, V(i, j)=G(n*16+i+H, m*16+J+V) (F2)
The distortion S is evaluated using the following evaluation function: ##EQU1##
The motion vector finding circuit 3a finds a vector (H, V) which minimizes the distortion S given by the above evaluation function (F3), and regards this vector H, V as the motion vector, and outputs this motion vector (H, V) and the prediction image PH, V(i, j).
When SEmc&lt;SEnomc, the comparing and selecting circuit 7a outputs a signal 113 indicating motion compensation (MC) mode and the distortion SEmc (111). When SEmc.gtoreq.SEnomc, the comparing and selecting circuit 7a outputs a signal 113 indicating no motion compensation (NOMC) mode and the distortion SEnomc (112). When the mode selected by the comparing and selecting circuit 7a is the MC mode, the selector 4a outputs the motion vector 109 selected by the motion vector search circuit 3a, and the selector 4b selects the prediction image 110 selected by the motion vector search circuit 3a.
When the mode selected by the comparing and selecting circuit 7a is the NOMC mode, the selector 4a outputs the zero vector, and the selector 4b selects the reference image 103.
The variance calculator 9 calculates the variance of each matching block of the input image signal 101. The comparing and selecting circuit 7b compares the distortion 114 from the comparing and selecting circuit 7a and the variance 115 from the variance calculator 9, and selects the intra mode for intra-picture encoding, or a selection mode output from the comparing and selecting circuit 7a.
The motion vector output from the motion compensated prediction circuit 17 is encoded at the variable-length encoder 19, an example of which is shown in FIG. 4. Referring to FIG. 4, the motion vector 107 output from the motion compensated prediction circuit 17 is supplied to a first input of a subtractor 30. An output of the subtractor 30 is input to the variable-length code selector 31, and supplied via a memory 32 to a first input of a selector 33. Applied to a second input of the selector 33 is a zero vector. The output 102 of the quantizer 12 is variable-length-encoded an encoder 34. An output of the variable-length code selector 31 and an output of the encoder 34 are multiplexed at a multiplexer 35, and supplied to the transmitting buffer 20.
As shown in FIG. 4, a difference between the motion vector for each matching block and the motion vector for the preceding matching block is determined at the subtractor 30, and the variable-length code for the difference vector is output. When the current matching block is in the intra mode or the NoMC mode, the motion vector is not encoded. When the preceding matching block is in the intra mode or the NoMC mode, or in the initial state of the encoding, the zero vector is used in place of the preceding motion vector. The variable-length code representing the difference vector is assigned a shorter code when it is closer to the zero vector.
In the conventional motion compensated prediction for the image signal encoding, transfer efficiency of the motion vector is low. Moreover, the motion vector is selected depending on the magnitude of the predicted distortion, so that when similar patterns are present over a wide area of the picture, or where the picture is featureless and flat, the difference in the predicted distortion may be small, and a block different from a truly corresponding block may erroneously found as a corresponding block. If a block farther away from the truly corresponding block is found as a corresponding block, an unnecessarily large motion vector is transmitted, and the picture is distorted.
Another problem associated with the conventional system is that the motion vectors for adjacent blocks sometimes differ so much, causing picture quality degradation. Moreover, the selection of the vector depends on the magnitude of the distortion, and the efficiency of transmission of the motion vectors is low.
A further problem associated with the conventional system is that if the range of motion vector search is expanded the amount of information of the codes of the vectors is increased. If on the other hand the range of the motion vector search is narrowed rapid motion cannot be traced.
FIG. 5 is another way of presenting the conventional image signal encoding system shown in the previously mentioned publication, ISO-IEC/JTC1/SC29/WG11 MPEG 92/N0245 Test Model 2. Reference numerals identical to those in FIG. 1 denote identical or corresponding elements. The memory 16 and the selector 18 in FIG. 1 are not shown, but instead a memory 21 is added. The digital video signal 101a received at the input terminal 1 is input to and stored in the memory 21, and the video signal 101b read out of the memory 21 is supplied to the first input of the subtractor 10 and to the motion compensated prediction circuit 17. The output of the motion compensated prediction circuit 17 is supplied to the second input of the subtractor 10, and to the second input of the adder 15. The rest of the configuration is similar to that of FIG. 1.
FIG. 6 is a schematic diagram showing the concept of motion compensated prediction in the prior art image signal encoding system. FIG. 7 is a schematic diagram showing the operation of the memory 21.
FIG. 8 shows an example of the motion compensated prediction circuit 17 used in the system of FIG. 5. The output 103 of the adder 15 (FIG. 5) is supplied via an input terminal 21a to a switching circuit 23. A first output of the switching circuit 23 is supplied to a first frame memory 24a. a second output of the switching circuit 23 is supplied to a second frame memory 24b. Reference images stored in and read out from the frame memories 24a and 24b are respectively supplied to first inputs of motion vector detectors 25a and 25b. The reference image from the memory 21 is supplied via a second input terminal 21b to second inputs of the motion vector detectors 25a and 25b. Outputs of the motion vector detectors 25a and 25b are supplied to first and second inputs of a prediction mode selector 26. The reference image 101b from the memory 21 is supplied to a third input of the prediction mode selector 26. A first output of the prediction mode selector 26 is input to a first input of a selector 27, a zero vector ("0") is supplied to a second input of the selector 27, and a second output of the prediction mode selector 26 is supplied to a third input of the selector 27. An output of the selector 27 is output via the output terminal 106.
Referring now to FIG. 6, the pictures are classified into intra-picture encoded picture (called I-picture), a one-way predictive-encoded picture (called P-picture), and a bi-directionally predictive-encoded picture (called B-picture). For instance, let us assume that it is desired that one out of every N pictures is an I-picture, and one out of M every pictures is a P-picture or an I-picture. If n and m are integers, and 1.ltoreq.m.ltoreq.N/M, then (N*n+M)-th pictures are made to be I-pictures, (N*n+M*m)-th pictures (m.noteq.1) are made to be P-pictures, and (N*n+M*m+1)-th to (N*n+M*m+M-1)-th pictures are made to be B-pictures. An assembly of (N*n+1)-th to (N*n+N)-th pictures are called a group of pictures or a GOP.
FIG. 6 shows the case where N=15, and M=3.
With respect to the I-pictures, intra-picture encoding, without inter-picture prediction, is conducted. With respect to P-pictures, prediction from an immediately preceding I- or P-picture is conducted. For instance, the sixth picture in FIG. 6 is a P-picture, and is predicted from the third, P-picture. The ninth, P-picture is predicted from the sixth, P-picture. With respect to the B-pictures, prediction from both the preceding and succeeding I- and P-pictures is conducted. For instance, the fourth and fifth, B-pictures are predicted from the third, I-picture and the sixth, P-picture. Accordingly, the fourth and fifth pictures are encoded, after the sixth picture is encoded.
Next, the operation of the encoding system, shown in FIG. 5, using the hybrid encoding method will be described.
The input digital image signal input via the input terminal 1 is input to the memory 21, and rearranged into the order of the encoding, and output, as shown in FIG. 7, in which "OI" indicates the order of input, while "OE" indicates the order of encoding. The order of the image signals is changed from that shown at the top of FIG. 7 into that shown at the bottom of FIG. 7. This is because, the first, B-picture in FIG. 6, for instance, cannot be encoded until after the third, I-picture is encoded, as described above.
The image signals 101b output from the memory 21 are supplied to the subtractor 10, where the difference between each image signal 101b and the prediction picture 106 from the motion compensated prediction circuit 17 is obtained, and the difference is subjected to DCT (discrete cosine transform) at the DCT circuit 11 in the direction of the time axis. The coefficients obtained by the DCT are quantized at the quantizer 12, and are then variable-length-encoded at the variable-length encoder 19, and output via the transmitting buffer 20.
The quantized transform coefficients are inverse-quantized at the inverse-quantizer 13, and are subjected to IDCT (inverse DCT) at the IDCT circuit 14, and are then added at the adder 15 to the prediction image 106 to produce a decoded image 103. The decoded image 103 is input to the motion compensated prediction circuit 17, for the purpose of encoding the next image.
The operation of the motion compensated prediction circuit 17 will next be described with reference to FIG. 8. The motion compensated prediction circuit 17 uses two reference images stored in the frame memories 24a and 24b, to perform motion compensated prediction using the image signal 101b, to produce the prediction image 106.
First, where the decoded image 103 is an I- or P-picture, the image 103 is written in the frame memory 24a or 24b for the encoding of the next picture. One of the frame memories 24a and 24b which was updated earlier is selected by the selector 23 for the writing of the newly input image 103. This means the frame memories 24a and 24b are selected alternately when a newly input image 103 is to be written. With such alternate selection, when the first and second, B-pictures in FIG. 6 are to be encoded, the zero-th, P-picture and the third, I-picture are stored in the frame memories 24a and 24b, respectively. When the sixth, P-picture is encoded and decoded, the frame memory 24a is updated with the decoded sixth, P-picture. Accordingly, when the fourth and fifth, B-pictures are to be encoded, the sixth, P-picture and third, I-picture are stored in the frame memories 24a and 24b, respectively. When the ninth, P-picture is encoded and decoded, the frame memory 24b is updated with the decoded ninth, P-picture. Accordingly, when the seventh and eighth, B-pictures are to be encoded, the sixth and ninth, P-pictures are stored in the frame memories 24a and 24b, respectively.
When the image signal 101b output from the memory 21 is input to the motion compensated prediction circuit 17, the two motion vector detectors 25a and 25b detect the motion vector using the reference pictures stored in the frame memories 24a and 24b, and outputs the motion compensated prediction picture.
That is, the image signal 101b for one picture is divided into a plurality of blocks, and for each block, one of the reference blocks which minimizes the prediction distortion is selected, and the relative position of the selected block is output as the motion vector, and the selected block is output as the motion compensated prediction image. The prediction mode selector 26 selects one of the two motion compensated prediction images from the motion vector detectors 25a and 25b and the average image thereof which gives the minimum prediction distortion, and outputs the selected image as the prediction image. If the image signal 101b is an I-picture or a P-picture, the motion compensated prediction image within the reference picture input earlier is selected and output. That is, where the image signal 101b is an I-picture or a P-picture, and if the reference image stored in the frame memory 24b is of the one earlier than the reference image stored in the frame memory 24a, the motion compensated prediction image from the motion vector detector 25b is selected and output. If the reference image stored in the frame memory 24a is of the one earlier than the reference image stored in the frame memory 24b, the motion compensated prediction image from the motion vector detector 25a is selected and output.
The prediction mode selector 26 also selects one of the intra-picture encoding (which does not use prediction), and the inter-picture prediction encoding using the selected prediction image which yields a higher encoding efficiency. If the image signal 101b is an I-picture, the intra-picture encoding is always selected. When the intra-picture encoding is selected, a signal indicating the intra-picture encoding is output as the prediction mode signal. When the inter-picture encoding is selected, a signal indicating the selected prediction image is output as the prediction mode signal. When the prediction mode output from the prediction mode selector 26 is an intra-picture encoding mode, the selector 27 outputs a zero signal ("0"). Otherwise, the selector 27 outputs the prediction image from the prediction mode selector 26.
Thus, it will be understood that when the image signal 101b output from the memory 21 is an I-picture, the motion compensated prediction circuit 17 outputs a zero signal as the prediction image 106, so that no inter-picture prediction is performed for the I-picture and intra-picture conversion encoding is conducted. When the image signal 101b output from the memory 21 is the sixth, P-picture, in FIG. 6, the motion compensated prediction circuit 17 performs motion compensated prediction from the third, I-picture in FIG. 6 to produce the prediction image 106. When the image signal 101b output from the memory 21 is the fourth, B-picture in FIG. 6, the motion compensated prediction circuit 17 performs motion compensated prediction from the third, I-picture and the sixth, P-picture in FIG. 6, to produce the prediction image 106.
Since the conventional image signal encoding system is configured as described above, even if the motion is 30 pixels per frame, if the P-picture interval M is three, the motion vector is of 90 pixels, and the motion vector search range must be wide. That is, the temporal distance between the pictures, in particular for the P-picture prediction, is long, and the motion vector range must be wide, and the hardware size is therefore large, and the amount of information of the motion vector codes is large. If the motion vector search range is narrow, the correct motion vector cannot be found, and the prediction efficiency is low, and the amount of information of the codes is enlarged, or the picture quality is degraded.
Moreover, the conventional image signal encoding system is configured as described does not take account of scene changes. If a scene change occurs at a P-picture or a B-picture, there will be no effects of the motion compensated prediction, so that the amount of information of the codes is enlarged or the picture quality is degraded.
Further problems of the prior art system will next be described. If the input image signal 101b is represented by F(i,j), with I representing the pixel number in the horizontal direction, and j representing the pixel number in the vertical direction, and the reference picture stored in the frame memory 24a is represented by G(i,j), and the whole picture is divided into blocks Bn,m(i,j), each including 16 pixels in the horizontal direction by 16 lines in the vertical direction, with n=0, 1, 2, . . . indicating the position of the block in the horizontal direction, and m=0, 1, 2, . . . indicating the position of the block in the vertical direction, and 0.ltoreq.i.ltoreq.15, and 0.ltoreq.j.ltoreq.15. The block is represented by: EQU Bn,m(i,j)=F(n*16+i, m*16+j)
For each block, one of the reference blocks which minimizes the prediction distortion is selected by means of block matching, and the relative position of the selected reference block is output as representing the motion vector, and the block is output as the motion compensated prediction image.
When the input image signal 101 is an interlace signal, and each frame is treated as one picture, the block matching is conducted for each frame and for each field, and the result of the block matching which yields a smaller prediction matching is selected. When the block matching is conducted for each frame, the prediction distortion E0(Vh,Vv) for the vector (Vh, Vv) is calculated by: ##EQU2##
If the motion vector search range is .+-.Mh pixels in the horizontal direction and .+-.Mv lines in the vertical direction, the vector (Vh,Vv)=(Vh0,Vv0) within -Mh.ltoreq.Vh.ltoreq.+Mh, and -Mv.ltoreq.Vv.ltoreq.+Mv, and giving the minimum E0(Vh,Vv) is determined, and e0 is defined as (written for) the E0(Vh0,Vv0) for the (Vh0, Vv0).
If the block matching is made for each field, the block Bn,m(i,j) is divided into first and second fields. For the first field of the block Bn,m(i,j), the prediction distortion E1(Vh,Vv,f) (f=0,1) for the vector (Vh,Vv) is calculated by: ##EQU3##
If the motion vector search range is .+-.Nh pixels in the horizontal direction and .+-.Nv lines in the vertical direction, the vector (Vh,Vv)=(Vh1,Vv1) within -Nh.ltoreq.Vh.ltoreq.+Nh, and -Nv.ltoreq.Vv.ltoreq.+Nv, and f=f1 which give in combination the minimum E1(Vh,Vv,f) is determined, and e1 is defined as E1(Vh1,Vv1,f1). f indicates whether the reference image is of a first field or of a second field.
For the second field of the block Bn,m(i,j), the prediction distortion E2(Vh,Vv,f) (f=0,1) for the vector (Vh,Vv) is calculated by: ##EQU4##
The vector (Vh,Vv)=(Vh2,Vv2) and f=f2 giving the minimum E2(Vh,Vv,f) is determined, and e2 is defined as E2(Vh1,Vv1,f2).
Finally, e0 and e1+e2 are compared with each other. If e0 is larger, the two vectors (Vh1, Vv1), (Vh2, Vv2) and f1, f2 indicating the fields, and the corresponding motion compensated prediction images B'n,m(i,j): EQU B'(n,m(i,2*j)=G(n*16+i+Vh1, m*16+2*j+f1+Vv1) EQU B'(n,m(i,2*j+1)=G(n*16+i+Vh2, m*16+2*j+f2+Vv1)
are output.
If e0.ltoreq.e1+e2, the vector (Vh0,Vv0) and the motion compensated prediction image B'n,m(i,j) EQU B'(n,m(i,j)=G(n*16+i+Vh0, m*16+j+Vv0)
are output.
The operation of the motion vector detector 25b is identical to that of the motion vector detection circuit 25a, except that the reference images used are those stored in the frame memory 24b.
Because the conventional image signal encoding system is required to conduct the calculations of the equations (F4) to (F8), when the motion vector search range is widened to cope with the quickly moving pictures, the amount of calculation is increased, and as a result the size of the hardware had to be increased.