1. Field of the Invention
This invention relates to a coding technique of a digital image, and more particularly to a motion prediction apparatus and method which is capable of reducing a calculation amount required in a single-pixel and half-pixel motion prediction process as well as an input and output band width when a motion is predicted by employing a hierarchical block matching algorithm.
2. Description of the Related Art
There has been required an information compressing method so as to process a large quantity of information resulting from a tendency of multimedia in the recent communication media. Accordingly, various information compressing technique has been developed. The typical information compressing method includes the MPEG(Moving Picture Experts Group)-2 which is an international standard of the moving picture compressing method.
Generally, the macro block is a basic unit for performing a signal compression in a coder of MPEG-2 system. One macro block consists of a brightness signal(Y) block having 16.times.16 pixels and a color-difference signal(Cr and Cb) block having 8.times.8 pixels.
The first step for the image compression is extracting the macro block from a certain input image. To this end, there is required three operations of the color space conversion, the chrominance component decimation and the block partitioning. The color space conversion is an operation for transform the input image into Y, Cr and Cb space so as to reduce the redundancy of red(R), green(G) and blue(B) input from a camera to be converted into a digital shape. The color-difference signal decimation refers to decimating the color-difference signals Cr and Cb in the horizontal and vertical direction because the brightness signal Y representing the contrast of image has such a wide frequency band that it is well recognized visually, whereas the recognition factor in the color-difference signal Cr or Cb representing colors is lower than that in the brightness signal Y. For example, in the case of a format image having a ratio of 4:2:0, the respective decimation factors become a ratio of 2:1. The block partitioning is to divide Y, Cb and Cr images obtained through the color space conversion and the chrominance component decimation mentioned above into sizes suitable for coding them. For example, the brightness signal Y is divided into a 16.times.16 pixel unit, and each color-difference signal Cr and Cb is divided into a 16.times.16 pixel unit.
The second step for the image compression is to provide a motion prediction and a compensation for the macro blocks extracted from the entire image regularly. Such motion prediction and compensation are intended to compress an image effectively by omitting a redundant coding process for the adjacent video image in the time base. The conventional motion prediction and compensation process will be explained with reference to a coder of MPEG-2 system shown in FIG. 1 below.
FIG. 1 is a block diagram showing a typical coder of MPEG-2. In FIG. 2, the MPEG-2 system coder includes a frame memory 2 connected to an input line 1, a frame delay 18 for storing a decoded image, and a motion estimator 20 connected commonly to the input line 1, the frame memory 2 and the frame delay 18 to perform an operation for predicting and compensating for a motion of an input image.
In the coder shown in FIG. 1, the frame memory 2 serves to store an image received over the input line 1 in the frame unit. The motion estimator 20 predicts and compensates a motion of the input image. To this end, the motion estimator 20 is comprised of a first motion estimator 22 connected to the input line 1 and the frame memory 2 commonly, a second motion estimator 24 connected to the input line, the first motion estimator 22 and the frame delay 18, and a motion compensator 26 connected to the second motion estimator 24 and the frame delay 18. The first motion estimator 22 detects a position of the most analogous block to the previous image stored in the frame memory 2 with respect to the brightness signal(Y) block in a certain macro block from the image signal received over the input line 1. The detected block position is employed as a reference position for the second motion estimator 24. The second motion estimator 24 receives the input image inputted over the input line 1 and a reconstructed image stored in the frame delay 18 to detect the most analogous block to the brightness signal(Y) block in the macro block with respect to a reference position inputted from the first motion estimator 22 from the reconstructed image. Then, the MPEG-2 system coder transfers the detected position to a decoder, so that the decoder can obtain an image identical to the reconstructed image referred in the coder on a basis of the received position information. The motion compensator 26 extracts the most analogous block to the macro block from the reconstructed image stored in the frame delay 18 on a basis of the final position information generated at the second motion estimator 24.
The MPEG-2 system coder further includes a subtractor 4 connected commonly to the frame memory 2 and the motion compensator 26 to generate a difference image between the previous image and the estimated reconstructed image, a coder 34 connected to the subtractor 4 to code the difference image, a decoder 36 connected to the coder 34 to reconstruct the coded difference image, and an adder 16 connected to the decoder 36 and the image compensator 26 to add the reconstructed difference image and the estimated image and output the added image to the frame delay 18. Moreover, The MPEG-2 system coder includes a variable length coder(VCL) and a buffer 32 that are connected, in series, to the coder 34, and a bit rate controller 10 for controlling a bit generation rate by adjusting quantizing step sizes Qp of a quantizer 8 and a dequantizer 12 with reference to the characteristic of the input image stored in the frame memory 2 and the data quantities of the buffer 32.
In such a configuration, the subtractor 4 generates a difference image between a macro block of the previous image stored in the frame memory 2 and a macro block of the estimated reconstructed image from the motion compensator 26 and outputs the difference image to the coder 34. In other words, the subtractor 4 outputs a difference image in which a redundancy between images adjacent to each other in the time base is eliminated. The coder 34 carries out the discrete cosine transform(DCT) processing for the difference image inputted from the subtractor 4 to code the difference image, thereby eliminating the space area co-relationship existing in the difference image. To this end, the coder 34 further includes a DCT circuit 6 for carrying out a DCT operation of the difference image in an 8.times.8 pixel unit, and a quantizer 8 for quantizing the DCT transformed signal. The VCL 30 is connected to the quantizer 8 to compress and output the coded difference image again in accordance with a value of code generation probability. The buffer 32 is connected to the VCL 30 to output a bit stream of the difference image in the first-in first-out system. The decoder 36 connected to the quantizer 8 reconstructs the coded difference image by carrying out an operation similar to the image reconstruction process performed at the coder. To this end, the decoder 36 includes an inverse quantizer 12 connected, in series, to the quantizer 8 to inverse-quantize the coded difference image, and an inverse discrete cosine transform(IDCT) circuit 14 for reconstructing the difference image by carrying out the IDCT operation. The adder 16 adds the difference image reconstructed at the IDCT circuit 14 to the estimated image from the motion compensator 26 and outputs the added image to the frame delay 18. Accordingly, the frame delay 18 stores a new reconstructed image for estimating an image to be inputted in the next order and allows it to be utilized to provide the motion prediction and compensation at the motion estimator 20.
FIG. 2 is a detailed block diagram showing the configuration of the first and second motion estimators 22 and 24 in the motion estimator 20 of FIG. 1. Each of the first and second motion estimators 22 and 24 simultaneously carry out a motion prediction operation with respect to five paths, i.e., frame, top-to-top, bottom-to-top, top-to-bottom and bottom-to-bottom paths. The first motion estimator 22 makes use of the input image and the previous image to perform a motion prediction in a single pixel unit with respect to the five paths. In this case, an image corresponding to a retrieval area is the previous image stored in the frame memory 2. The first motion estimator 22 makes use of a block matching algorithm for each five-path to predict a motion in the single pixel unit, thereby detecting a motion vector MV. The block matching algorithm refers to a process in which the most analogous block to a specified block of the input image is found from the previous image. The second motion estimator 24 predicts a motion in a half pixel unit on a basis of the single pixel unit of motion vector MV inputted from the first motion estimator 22. To this end, the second motion estimator 24 includes a half-pixel motion vector detector 21, first and second multiplexors 23 and 25, a second adder 27 and a field/frame determining circuit 29. In such a second motion estimator 24, the half-pixel motion estimator 21 detects a final motion vector by predicting a motion vector in a half pixel unit on a basis of each motion vector MV in a single pixel unit for the five paths inputted from the first motion estimator 22. In this case, the used retrieval area is a reconstructed image stored in the frame delay 18 in FIG. 1. The first multiplexor 23 selectively outputs a motion vector and a motion prediction error in the top-to-top path and a motion vector and a motion prediction error in the bottom-to-top path, which are detected at the half-pixel motion estimator 21, to the field/frame determining circuit 29 and the adder 27. The second multiplexor 22 selectively outputs a motion vector and a motion prediction error in the top-to-bottom path and a motion vector and a motion prediction error in the bottom-to-bottom path, which are detected at the half-pixel motion estimator 21, to the field/frame determining circuit 19 and the adder 27. Then, the adder 27 adds the motion detection errors between the fields outputted from the first and second multiplexors 23 and 25 and outputs the added motion detection error to the field/frame determining circuit 29. The field/frame determining circuit 29 compares a half-pixel motion detection error value in the frame path outputted from the half-pixel motion estimator 21 with a motion detection error value in the field path outputted from the adder 27 to thereby select a vector having the smaller motion detection error value, and outputs the selected vector value to the motion compensator 26 shown in FIG. 1.
FIGS. 3A and 3B depict a motion prediction method in a half-pixel unit employing a block matching algorithm. FIG. 3A shows an input image I.sub.t, and FIG. 3B does the previous image I.sub.t-1. In the input image I.sub.t, the size N.sub.B of a specified block B.sub.t is 16. First, a local area for finding a block analogous to the specified block B.sub.t at the reference position (x,y) in the input image I.sub.t is determined from the previous image I.sub.t-1. In this case, it is assumed that a local area determined from the previous image I.sub.t-1 has a size of x-S.about.x+S+N.sub.B -2 in the horizontal direction; while having a size of y-S.about.y+S+N.sub.B -2 in the vertical direction, on a basis of the reference position (x,y). Herein, S represents a value for determining a size of the retrieval area. Next, the mean absolute difference(MAD) is used as a criterion for finding the most analogous block to the specified block B.sub.t of the input image I.sub.t at the local area of the previous image I.sub.t-1. In other words, a MAD between a certain block B.sub.t-1 and a specified block B.sub.t having a size of N.sub.B.times.N.sub.B is calculated at every certain position (u,v) in the local area of the previous image I.sub.t-1. This MAD can be given from the following formula: ##EQU1##
wherein B.sub.t (x+i,y+j) represents a (i,j)th pixel of the specified block B.sub.t, a reference position of which is (x, y), in the input image I.sub.t ; and B.sub.t-1 (x-u+i,y-v+j) represents a (i,j)th pixel of the block, a reference position of which is a position moved by (u,v) from (x, y), in the previous image I.sub.t-1. Subsequently, a position ((u,v)*) of a block B.sub.t-1 having the smallest MAD in the previous image I.sub.t-1 is detected. Herein, a displacement from a reference position (x,y) of the input image I.sub.t until a position ((u,v)*) of the previous image I.sub.t-1 is referred as to "a motion vector MV in a half pixel unit". Further, in order to obtain a motion vector MV in a single pixel unit from the formula (1) for calculating the MAD, it is necessary to provide an exponentially increasing calculation with respect to each field/frame path like the following formula: EQU Frame: N.sub.B.times.N.sub.B.times.2S.times.2S.times.M PA1 wherein M represents a calculation amount required in a calculation of MDA per unit pixel. Also, if it is assumed that the picture size is W.times.H and the frame rate is 30 frame/second, then a calculation amount OP.sub.SBMA required every second for obtaining a motion vector in a single pixel unit can be expressed as the following formula: ##EQU3## PA1 Further, a ratio of a calculation amount OP.sub.FSBMA for obtaining a motion vector in a single pixel unit required every second to a calculation amount OP.sub.HPSBMA for obtaining a motion vector in a half pixel unit is given as follows: ##EQU6##
Top-to-top, Bottom-to-top, Top-to-bottom and bottom-to-bottom fields: ##EQU2##
FIG. 4 depicts the conventional method of predicting a motion in a half-pixel unit. Herein, the motion prediction in a half pixel unit refers to detecting the position of a block having the smallest error with respect to 9 half-pixels positioned at .+-.0.5 point on a basis of the motion vector MV in a single pixel unit detected at the first motion estimator 22. The position of the block having the smallest error can be detected by making use of the block matching algorithm in similarity to the above-mentioned motion prediction method in a single pixel unit. Each block corresponding to the 9 half-pixel position based on the motion vector in a single pixel unit can be calculated by the following formula: EQU Retrieval position 4, 5: I(u.+-.0.5, v)={I(u,v)+I(u.+-.1,v)}/2 EQU Retrieval position 2, 7: I(u, v.+-.0.5)={I(u,v)+I(u, v.+-.1)}/2 EQU Retrieval position 1, 3, 6, 8: I(u.+-.0.5, v.+-.0.5)={I(u,v).+-.I(u, v.+-.1)+I(u.+-.1,v)+I(u.+-.1,v.+-.1)}/4 (4)
wherein (u,v) represent the co-ordinates for the motion vector in a single pixel unit.
Further, a calculation amount used when a motion in a half-pixel unit for each five path is predicted by applying the formula (4) can be seen from the following formula: EQU Frame : N.sub.B.times.N.sub.B.times.8.times.(M+L)
Top-to-top, Bottom-to-top, Top-to-bottom and bottom-to-bottom fields: ##EQU4##
wherein L represents a calculation amount required for making one pixel at a half-pixel position. It is to be noted that the entire calculation amount required for a motion prediction in a half pixel unit is 3.times.N.sub.B.times.N.sub.B.times.8.times. (M+L) as seen from the formula (5). In this case, if it is assumed that that the picture size is W.times.H and the frame rate is 30 frame/second, then a calculation amount OP.sub.HPSBMA required every second for obtaining a motion vector in a single pixel unit is given by the following formula: ##EQU5##
It is to be understood from the equation that, as S increases, that is, as the retrieval area increases, a retrieval in a single pixel unit requires more and more large calculation amount than a retrieval in a half pixel unit.
As a result, when all positions within a motion prediction area is retrieved so as to provide a motion prediction in a single pixel unit, as a size of retrieval area increases, a tremendous calculation amount is required for the motion prediction in a single pixel unit. Accordingly, there has been developed various high speed retrieval algorithms to reduce a calculation amount for the motion prediction in a single pixel unit. A typical example of the high speed retrieval algorithms includes a hierarchical block matching algorithm.
FIGS. 5A to 5C illustrate an example of a hierarchical block matching algorithm consisting of three layers. A unit image is reconstructed into an image having a hierarchical structure for the hierarchical block matching algorithm. In FIGS. 11A to 11C, an image in a layer l+1 is an image obtained by filtering and sub-sampling an image in a layer 1. The pixel number in the horizontal and vertical direction of an image in the layer l+1 is reduced to 1/2 compared with that of an image in the layer l. A motion prediction process in a single pixel unit employing such a hierarchical structure image will be explained below.
First, as shown in FIG. 5A, a motion prediction for an image in a smallest size of layer 2(l=2) are performed. Herein, it is to be noted that the size of an image in layer 2 is reduced to 1/4 in the horizontal and vertical direction compared with that of the original image. The motion prediction method includes calculating and comparing block matching errors in an entire retrieval area MSA2 reduced to 1/4 by utilizing a specified block B.sub.t reduced in size as described above.
Next, as shown in FIG. 5B, a motion prediction for an image in the layer 1(l=1) is performed. In this case, in order to improve an accuracy of a motion vector detected from an image in the layer 2, the block matching method is applied to only a local area MSA1 having a size added with .+-.2 pixels around a specified block B.sub.t-1 based on the motion vector detected from the layer 2.
Subsequently, as shown in FIG. 5C, a motion prediction for an image in the layer 0(l=0) is performed. The motion prediction for an image in the layer 0 is carried out only for a local area MSA0 based on the motion vector detected from an image in the layer 1 in a similar manner to the motion prediction for an image in the layer 1.
Accordingly, a final motion vector detected by applying such a hierarchical block matching algorithm becomes a sum of motion vectors obtained from images in each layer.
FIG. 6 shows the configuration of a conventional motion prediction apparatus employing the above-mentioned hierarchical block matching algorithm. The motion prediction apparatus includes a first motion estimator 22 for predicting a motion in a single pixel unit by utilizing the hierarchical block matching algorithm, and a second motion estimator 24 for predicting a motion in a half pixel unit on a basis of a single-pixel motion vector inputted from the first motion estimator 22.
In the motion prediction apparatus shown in FIG. 6, the first motion estimator 22 carries out the motion prediction in a single pixel unit for three layers repeatedly by utilizing the above-mentioned hierarchical block matching algorithm, thereby detecting a final motion vector in a single pixel unit for five field/frame paths in an image of the lowermost layer 0. The second motion estimator 24 detects a motion vector in a half pixel unit on a basis of each final single-pixel motion vector for the five paths inputted from the first motion estimator 22.
FIG. 7 shows a detailed configuration of the single-pixel motion estimator and the half-pixel motion estimator for the layer 0 shown in FIG. 6. In FIG. 7, the single-pixel motion estimator 70 detects a final single-pixel motion vector MV.sub.0 by retrieving a local area of the layer 0 on a basis of a motion vector MV.sub.1 detected at the layer 1. To this end, the single-pixel motion estimator 70 for the layer 0 includes a first address generator 44 for receiving the motion vector MV.sub.1 detected at the layer 1 to generate a reference position information A.sub.0 for layer 0, a first buffer 46 connected to a data bus 54 to receive the previous image, a first internal memory 42 for storing an input image for the layer 0, a first arithmetic unit 40 connected commonly to the first buffer 46 and the first internal memory 42, and a first comparator 48 connected to the output terminal of the first arithmetic unit 40. The first address generator 44 receives a motion vector MV.sub.1 detected at the layer 1 to generate a reference position information A.sub.0 for the layer 0, and supplies it to an address bus 52. The first buffer 46 receives the previous image S.sub.0 via the data bus 54 and stores it temporarily. The first arithmetic unit 40 retrieves the previous image S.sub.0 inputted from the first buffer 46 on a basis of a specified block of the input image inputted from the first internal memory 42 to calculate a mean absolute difference(MAD) between the specified block of the input image and a certain block of the previous image S.sub.0. The first comparator 48 compares MADs inputted from the first arithmetic unit 40 to detect and output a motion vector MV.sub.0 for a position having the smallest MAD.
Meanwhile, the half-pixel motion estimator 80 retrieves a reconstructed image on a basis of the motion vector MV.sub.0 inputted from the single-pixel motion estimator 70 in the layer 0 to detect a motion vector in a half pixel unit. To this end, the half-pixel motion estimator 80 includes a second address generator 50 connected to the first comparator 48 and the address bus 52, a second buffer 52 connected to the data bus to receive the reconstructed image, an interpolator 54 connected to the output terminal of the second buffer 52 a second internal memory 58 for storing the input image, a second arithmetic unit 56 connected commonly to the interpolator 54 and the second internal memory, and a second comparator 60 connected to the output terminal of the second arithmetic unit 56. The second address generator 50 generates a position information A.sub.h corresponding to a value of the motion vector MV.sub.0 in a single pixel unit supplied form the first comparator 48. The second buffer 52 temporarily stores a reconstructed image S.sub.h supplied from the data bus 54. The interpolator 54 interpolates the reconstructed image supplied from the second buffer 52 and output the interpolated image to the second arithmetic unit 56. The second arithmetic unit 56 calculates a MAD in a half pixel unit by utilizing the reconstructed image S.sub.h inputted from the first buffer 46 and the input image stored in the second internal memory. The second comparator 60 compares MADs inputted from the second arithmetic unit to detect and output he motion vector MV.sub.h in a half pixel unit for a position having a smallest MAD.
If a motion prediction for five field/frame paths by employing such a hierarchical retrieval method as seen from an example of FIG. 6 and FIG. 7 is performed, then it has a disadvantage in that the required calculation can be reduced, but an accuracy of the motion prediction becomes deteriorated. This is caused by a fact that, when a retrieval for the entire retrieval area is performed at the uppermost layer having the lowest resolution so as to reduce the calculation amount, a probability in which an inaccurate initial motion vector may be detected becomes high and hence it is impossible to detect an accurate motion vector in the successive retrieval process employing the inaccurate initial motion vector. Accordingly, it is necessary to provide a novel motion prediction method which is capable of reducing the calculation amount during the motion vector detection in a single pixel unit as well as overcoming the problems in the existing method as mentioned above.
FIG. 8 is a view for explaining an input/output band width required at each step of the motion prediction method to which the hierarchical block matching algorithm is applied. In FIG. 8, the motion estimator 82 is commonly connected to four external memory EM2, EM1, EM0 and EMh over the data bus 54. In this case, the three external memory EM2, EM1 and EM0 stores input images in layer 2, layer 1 and layer 0, respectively, for the hierarchical retrieval. The remaining fourth external memory EMh stores a reconstructed image for the retrieval in a single pixel unit. Herein, if a requirement amount for an input/output band width of each step is calculated with reference to FIG. 5 and FIG. 6 assuming that the size of image is W.times.H and the frame rate is 30 frame/sec, then it can be expressed as the following formulas:
Input/output band width requirement amount for providing a retrieval area in a layer 2 IO.sub.layer2 : ##EQU7##
Input/output band width requirement amount for providing a retrieval area in a layer 1 IO.sub.layer1 : ##EQU8##
Input/output band width requirement amount for providing a retrieval area in a layer 0 IO.sub.layer0 : ##EQU9##
Input/output band width requirement amount for providing a retrieval area in a half pixel unit IO.sub.half : ##EQU10##
wherein a great part of the retrieval areas in layer 2 is overlapped due to the characteristic of hierarchical block matching algorithm, so that it becomes possible to reduce the input/output band width requirement amount dramatically by repeatedly utilizing the retrieval area data used once. Otherwise, since a retrieval area in the remaining layers is not overlapped, it is impossible to reduce the input/output band width requirement amount. For example, by applying values corresponding to a main profile at main level of MPEG-2(i.e., N.sub.B =16, W=720, and H=480) to the formula (12), an input/output band width requirement amount for the remaining layers except for the layer 2 is given as follows: ##EQU11##
It is to be noted from the above equations that an input/output band width for a retrieval in the layer 0 and in a half pixel unit is relatively large. Particularly, a B picture process requiring a bi-directional motion prediction needs twice the input/output band width. Accordingly, a strategy for decreasing an excessive input/output band width required in the motion prediction process has been demanded. Also, a scheme for reducing a tremendous calculation amount required for the single-pixel motion prediction has been needed.