1. Field of the Invention
This invention relates to a coding technique of a digital image, and more particularly to a motion prediction apparatus and method which is adapted to prevent the calculation redundancy occurring during predicting a motion in coding a digital image.
2. Description of the Related Art
There has been required an information compressing method so as to process a large quantity of information resulting from a tendency of multimedia in the recent communication media. Accordingly, various information compressing technique has been developed. The typical information compressing method includes the MPEG(Moving Picture Experts Group)-2 which is an international standard of the moving picture compressing method.
Generally, the macro block is a basic unit for performing a signal compression in a coder of MPEG-2 system. One macro block consists of a brightness signal(Y) block having 16xc3x9716 pixels and a color-difference signal (Cr and Cb) block having 8xc3x978 pixels.
The first step for the image compression is extracting the macro block from a certain input image. To this end, there is required three operations of the color space conversion, the chrominance component decimation and the block partitioning. The color space conversion is an operation for transform the input image into Y, Cr and Cb space so as to reduce the redundancy of red(R), green(G) and blue(B) input from a camera to be converted into a digital shape. The color-difference signal decimation refers to decimating the color-difference signals Cr and Cb in the horizontal and vertical direction because the brightness signal Y has such a wide frequency band that it is well recognized visually, whereas the recognition factor in the color-difference signal Cr or Cb representing the contrast colors is lower than that in the brightness signal Y. For example, in the case of a format image having a ratio 4:2:0, the respective decimation factors become a ratio of 2:1. The block partitioning is to divide Y, Cb and Cr images obtained through the color space conversion and the chrominance component decimation mentioned above into suitable for coding them. For example, the brightness signal Y is divided into a 16xc3x9716 pixel unit, and each color-difference signal Cr and Cb is divided into a 16xc3x9716 pixel unit.
The second step for the image compression is to provide a motion prediction and a compensation for the macro blocks extracted from the entire image regularly. Such motion prediction and compensation are intended to compress an image effectively by omitting the redundancy among the adjacent video images in the time base. The conventional motion prediction and compensation process will be explained with reference to a coder of MPEG-2 system shown in FIG. 1 below.
FIG. 1 is a block diagram showing a typical coder of MPEG-2, In FIG. 2, the MPEG-2 system coder includes a frame memory 2 connected to an input line 1, a frame delay 18 for storing a decoded image, and a motion estimator 20 connected to the input line 1, the frame memory 2 and the frame delay to perform an operation for predicting and compensating for a motion of an input image.
In the coder shown in FIG. 1, the frame memory 2 serves to store an image received over the input line 1 in the frame unit. The motion estimator 20 predicts and compensates a motion of the input image. To this end, the motion estimator 20 is comprised of a first motion estimator 22 connected to the input line 1 and the frame memory 2 commonly, a second motion estimator 24 connected to the input line, the first motion estimator 22 and the frame delay 18, and a motion compensator 26 connected to the second motion estimator 24 and the frame delay 18. The first motion estimator 22 detects a position of the most analogous block to the previous image stored in the frame memory 2 with respect to the brightness signal(Y) block in a certain macro block from the image signal received over the input line 1. The detected block position is employed as a reference position for the second motion estimator 24. The second motion estimator 24 receives the input image inputted over the input line 1 and a reconstructed image stored in the frame delay 18 to detect the most analogous block to the brightness signal(Y) block in the macro block with respect to a reference position inputted from the first motion estimator 22 from the reconstructed image. Then, the MPEG-2 system coder transfers the detected position to a decoder, so that the decoder can obtain an image identical to the reconstructed image referred in the coder on a basis of the received position information. The motion compensator 26 extracts the most analogous block to the macro block from the reconstructed image stored in the frame delay 18 on a basis of the final position information generated at the second motion estimator 24.
The MPEG-2 system coder further includes a subtractor 4 connected to the frame memory 2 and the motion compensator 26 commonly to generate a difference image between the previous image and the estimated reconstructed image, a coder 34 connected to the subtractor 4 to code the difference image, a decoder 36 connected to the coder 34 to reconstruct the coded difference image, and an adder 16 connected to the decoder 36 and the image compensator 26 to add the reconstructed difference image and the estimated image and output the added image to the frame delay 18. Moreover, The MPEG-2 system coder includes a variable length coder(VCL) and a buffer 32 that is connected, in series, to the coder 34, and a bit rate controller 10 for controlling a bit generation rate by adjusting quantizing step sizes Qp of a quantizer 8 and a dequantizer 12 with reference to a characteristic of an input image stored in the frame memory 2 and a data quantity of the buffer 32.
In such a configuration, the subtractor generates a difference image between a macro block of the previous image stored in the frame memory 2 and a macro block of the estimated reconstructed image from the motion compensator 26 and output the difference image to the coder 34. In other words, the subtractor outputs a difference image in which a redundancy between images adjacent to each other in the time base is eliminated. The coder 34 carries out the discrete cosine transform(DCT) processing for the difference image inputted from the subtractor 4 to code the difference image, thereby eliminating the space area co-relationship existing in the difference image. To this end, the coder 34 further includes a DCT circuit 6 for carrying out a DCT operation of the difference image in an 8xc3x978 pixel unit, and a quantizer 8 for quantizing the DCT transformed signal. The VCL 30 is connected to the quantizer 8 to compress and output the coded difference image again in accordance with a value of code generation probability. The buffer 32 is connected to the VCL 30 to output a bit stream of the difference image in the first-in first-out system. The decoder 36 connected to the quantizer 8 reconstructs the coded difference image by carrying out an operation similar to the image reconstruction process performed at the coder. To this end, the decoder 36 includes an inverse quantizer connected, in series, to the quantizer 8 to inverse-quantize the coded difference image, and an inverse discrete cosine transform(IDCT) circuit for reconstructing the difference image by carrying out the IDCT operation. The adder 16 adds the difference image reconstructed at the IDCT circuit 14 to the estimated image from the motion compensator 26 and outputs the added image to the frame delay 18. Accordingly, the frame delay 18 stores a new reconstructed image for estimating an image to be inputted in the next order, thereby utilizing for the purpose of predicting and compensating for a motion at the motion estimator 20.
Since the above-mentioned conventional motion estimator 20 performs a redundant motion prediction at the first and second motion estimator, it has a disadvantage in that the complication in the coder was raised. A detailed explanation as to the problem will be described with reference to FIG. 2 to FIG. 5 below.
FIG. 2 shows the configuration of a previous frame txe2x88x921 and a current frame t displayed by the interlaced format pattern. Generally, one frame displayed by the interlaced format pattern is constructed in the shape of combining a top field with a bottom field which are alternately generated every {fraction (1/60)} second. Accordingly, all operations are carried out with being divided into each field unit in order to perform an image compression effectively. The moving picture compressing method of MPEG-2 system predicts a motion for the field as well as the frame image so that it is applicable to an image in the interlaced format pattern effectively.
FIG. 3 is a detailed block diagram showing the configuration of the first and second motion estimators 22 and 24 in the motion estimator 20 of FIG. 1. Each of the first and second motion estimators 22 and 24 simultaneously carry out a motion prediction operation with respect to five paths, i.e., frame, top-to-top, bottom-to-top, top-to-bottom and bottom-to-bottom paths. The first motion estimator 22 makes use of the input image and the previous image to perform a motion prediction in a single pixel unit with respect to the five paths. In this case, an image corresponding to a retrieval area is the previous image stored in the frame memory 2. The first motion estimator 22 makes use of a block matching algorithm for each five-path to predict a motion in the single pixel unit, thereby detecting a motion vector MV. The block matching algorithm refers to a process in which the most analogous block to a specified block of the input image is found from the previous image. The second motion estimator 24 predicts a motion in a half pixel unit on a basis of the single pixel unit of motion vector MV inputted from the first motion estimator 22. To this end, the second motion estimator 24 includes a half-pixel motion vector detector 21, first and second multiplexors 23 and 25, a second adder 27 and a field/frame determining circuit 29. In such a second motion estimator 24, the half-pixel motion vector detector 21 detects a motion vector in the half pixel unit on a basis of each motion vector MV in the single pixel unit for the five paths inputted from the first motion estimator 22. In this case, the used retrieval area is a reconstructed image stored in the frame delay 18 in FIG. 1. The first multiplexor 23 selectively switches a motion vector in the top-to-top path and a motion vector in the bottom-to-top path which are detected at the half-pixel motion vector detector 21, and outputs it to the adder 27. In this case, the switching of the first multiplexor 23 is determined by comparing a motion detection error in the top-to-top path with a motion detection error in the bottom-to-top path. The second multiplexor 22 selectively switches a motion vector in the top-to-bottom path and a motion vector in the bottom-to-bottom path which are detected at the half-pixel motion vector detector 22. The switching of the second multiplexor 25 also is determined by comparing a motion detection error in the top-to-bottom path with a motion detection error in the bottom-to-bottom path. Then, the adder 27 adds half-pixel motion detection errors between the fields outputted from the first and second multiplexors 23 and 25 and outputs the added motion detection error to the field/frame determining circuit 29. The field/frame determining circuit 29 compares a half-pixel pixel motion detection error between the frames outputted from the half-pixel motion vector detector 21 with that between the fields outputted from the adder 27 to select a motion vector having the smaller motion detection error value, and outputs the selected motion vector to the motion compensator 26 shown in FIG. 1. In other words, the second motion estimator 24 compares a motion compensation error generated when a motion was predicted in a frame unit with a motion compensation error generated when a motion was predicted in a field unit to select and output the smaller error. For example, the case where a picture is suddenly changed between the fields like a sports image corresponds to the case where a motion compensation error in the field unit is smaller.
FIGS. 4A and 4B depict a motion prediction method in the half-pixel unit employing a block matching algorithm. FIG. 4A represents an input image It, and FIG. 4B does the previous image Itxe2x88x921. In the input image It, the size NB of a specified block Bt. First, a local area for finding a block analogous to the specified block Bt at the reference position (x,y) in the input image It is determined in the previous image Itxe2x88x921. In this case, it is assumed that a local area determined in the previous image Itxe2x88x921has a size of xxe2x88x92Sxcx9cx+S+NBxe2x88x922 in the horizontal direction; while having a size of yxe2x88x92Sxcx9cy+S+NBxe2x88x922 in the vertical direction, on a basis of the reference position (x,y). Herein, S is a parameter defining a size of the retrieval area. Next, the mean absolute difference(MAD) is used as a criterion for finding the most analogous block to the specified block Bt of the input image It in the local area of the previous image Itxe2x88x921. In other words, a MAD between a certain block Btxe2x88x921 and a specified block Bt having a size of Btxc3x97Bt is calculated at every certain position (u,v) in the local area of the previous image Itxe2x88x921. This MAD can be given from the following formula:                                                                         MAD                ⁡                                  (                                      u                    ,                    v                                    )                                            =                              xe2x80x83                            ⁢                                                1                                                            N                      B                                        xc3x97                                          N                      B                                                                      ⁢                                                      ∑                                          i                      =                      0                                                              i                      =                                                                        N                          B                                                -                        1                                                                              ⁢                                                            ∑                                              j                        =                        0                                                                    j                        =                                                                              N                            B                                                    -                          1                                                                                      ⁢                                          "LeftBracketingBar"                                                                                                    B                            t                                                    ⁡                                                      (                                                                                          x                                -                                i                                                            ,                                                              y                                -                                i                                                                                      )                                                                          -                                                                                                                                                                                    xe2x80x83                            ⁢                                                B                                      t                    -                    1                                                  ⁡                                  (                                                            x                      -                      i                      -                      u                                        ,                                          y                      -                      j                      -                      v                                                        )                                            "RightBracketingBar"                                                          (        1        )            
wherein B(xxe2x88x92i,yxe2x88x92j) represents a pixel of the specified block Bt1 with respect to a reference position (x,y) in the input image It. Subsequently, a position ((u,v)*) of a block Btxe2x88x921 having the smallest MAD in the previous image Itxe2x88x921 is detected. Herein, a displacement from a reference position (x,y) of the input image It until a position ((u,v)*) of the previous image Itxe2x88x921 is referred as to xe2x80x9ca motion vector MV in the half-pixel unitxe2x80x9d. Further, in order to obtain a vector MV in a single pixel unit from the formula (1) for calculating the MAD, it is necessary to provide an exponentially increasing calculation with respect to each field/frame path like the following formula:
Frame: NBxc3x97NBxc3x972Sxc3x972Sxc3x97M
Top-to-top, Bottom-to-top,
Top-to-bottom and bottom-to-bottom fields:                     4        xc3x97                  N          B                xc3x97                              N            B                    2                xc3x97        2        ⁢        S        xc3x97                              2            ⁢            S                    2                xc3x97        M                            (        2        )            
wherein M represents a calculation amount required in a calculation of MDA per unit pixel. Thus, it should be understood that, when S is large, that is, when a retrieval area is large, a motion prediction in the single pixel unit does not become easy, because a calculation of 2xc3x97NBxc3x97NBxc3x972Sxc3x972Sxc3x97M is needed as a whole so as to obtain the motion vector MV in a single pixel unit.
FIG. 5 depicts the conventional method of predicting a motion in a half-pixel unit. Herein, the motion prediction refers to detecting the position of a block having the smallest error with respect to 9 half-pixels positioned at xc2x10.5 point on a basis of the motion vector MV in a single pixel unit detected at the first motion estimator 22. The position of the block having the smallest error can be detected by making use of the block matching algorithm in a similar manner to the above-mentioned motion prediction method in a single pixel unit. Each of blocks corresponding to the 9 half-pixel position based on the motion vector in a single pixel unit can be calculated by the following formula:
Retrieval positions 4, 5: I(uxc2x10.5,v)={I(u,v)+I(uxc2x11,v)}/2
Retrieval positions 2, 7: I(u,vxc2x10.5)={I(u,v)+I(u,vxc2x11)}/2
Retrieval positions 1, 3, 6, 8: I(uxc2x10.5, vxc2x10.5)={I(u,v)+I(u,vxc2x11)+I(uxc2x11,v)+I(uxc2x11,vxc2x11)}/4xe2x80x83xe2x80x83(3)
wherein (u,v) represent the co-ordinates for the motion vector in a single pixel unit.
Further, a calculation amount used when a motion in a half-pixel unit with respect to each five path is predicted by applying the formula (3) can be seen from the following formula:
Frame: NBxc3x97NBxc3x978xc3x97(M+L)
Top-to-top, Bottom-to-top,
Top-to-bottom and bottom-to-bottom fields                               N          B                xc3x97                              N            B                    2                xc3x97        8        xc3x97                  (                      M            +            L                    )                                    (        4        )            
wherein L represents a calculation amount required in making one pixel in a half-pixel position. The entire calculation amount required for a motion prediction in a half-pixel unit from the formula (4) is and which can be disregarded in comparison to a calculation needed in a motion prediction process in a single pixel unit.
As described above, the conventional motion estimators 22 and 24 has carried out a redundant motion prediction for five paths(i.e., frame, and top-to-top, bottom-to-top, top-to-bottom and bottom-to-bottom fields), so that it becomes possible to predict more analogous block from the reconstructed image. However, the conventional motion prediction scheme has drawbacks in that it not only raises the complication of the coder due to the redundancy of calculation for a motion prediction, but also requires greater hardwares when it is implemented with a very large scale integrated circuit(VLSI). Particularly, a calculation for the motion prediction in a single path needs a lot of calculation amount as well as complex hardwares in the case of the first motion estimator 22, thereby raising the complication due to a redundant motion prediction operation for the five paths.
Accordingly, it is an object of the present invention to provide a motion prediction method and apparatus that is capable of reducing a calculation amount by eliminating the redundant calculation in a motion prediction process in a single pixel unit.
Further object of the present invention is to provide a motion prediction apparatus and method that is capable of reducing the complication of hardwares by eliminating the redundant calculation in a motion prediction process in a single pixel unit.
Still further object of the present invention is to provide a motion prediction apparatus and method that can prevent a deterioration in a motion prediction performance due to a reduction in a calculation amount.
In order to achieve these and other objects of the invention, a motion prediction apparatus according to one aspect of the present invention includes first motion estimating means for retrieving a previous image to predict a single-pixel motion in a top-to-top field path and in a bottom-to-bottom field path with respect to the input image; frame vector determining means for determining a frame motion vector on a basis of a motion vector outputted from the first motion estimating means; scaling means for scaling each motion vector outputted form the first motion estimating means to determine a field motion vector in a different path; and second motion estimating means for retrieving a decoded image on a basis of each single-pixel motion vector outputted from the first motion estimating means, the frame vector determining means and the scaling means.
A motion prediction method according to another aspect of the present invention includes the steps of (A) detecting motion vectors by retrieving a previous image to predict motions in a single pixel unit in a top-to-top field path and in a bottom-to-bottom field path with respect to the input image; (B) determining field motion vectors for two different paths by determining a frame motion vector on a basis of the field motion vectors and scaling each of the field; and (C) detecting a motion vector in a half pixel unit by retrieving a decoded image on a basis of the motion vectors in the steps (A) and (B).
A motion prediction method according to still another aspect of the present invention includes the steps of (A) detecting a single-pixel motion vector by repeatedly predicting a motion in a single pixel unit hierarchically with respect to a top-to-top field path and a bottom-to-bottom field path of an input image and a previous image consisting of n(nxe2x89xa72) layers having a different size of retrieval areas, said n being an integer; (B) determining field motion vectors for two different paths by determining a frame motion vector on a basis of the field motion vectors and scaling each of the field motion vectors; (C) detecting a final single-pixel motion vector at the lowermost layer by repeatedly predicting a motion in a single pixel unit hierarchically at m(1xe2x89xa6m less than n) layers next to the certain layer on a basis of the motion vectors in the steps (A) and (B); and (D) predicting a motion in a half pixel unit by retrieving a decoded image on a basis of the single-pixel motion vector detected in the step (C).
A motion prediction method according to still another aspect of the present invention includes the steps of (A) detecting a single-pixel motion vector by repeatedly predicting a motion in a single pixel unit hierarchically with respect to a top-to-top field path and a bottom-to-bottom field path of an input image and a previous image consisting of n(nxe2x89xa62) layers having a different size of retrieval areas, said n being an integer; (B) determining field motion vectors for two different paths by determining a frame motion vector on a basis of the field motion vectors and scaling each of the field motion vectors; and (C) predicting a motion in a half pixel unit by retrieving the decoded image on a basis of the single-pixel motion vector detected in the steps (A) and (B).