The present invention relates to detection of a motion vector between a block in an image and a corresponding block in another image, and, more particularly, is directed to reducing the number of operations required to detect a motion vector while maintaining the accuracy of the detected motion vector.
Motion vectors are useful in predictive coding of a series of digital images, which reduces the amount of information needed to represent the series of images. For example, the Moving Picture Coding Experts Group (MPEG) international standard for highly efficient coding of moving pictures employs orthogonal transformation, specifically a discrete cosine transformation (DCT), and predictive encoding with motion compensation.
FIG. 1 shows an example of a predictive encoding circuit using motion compensation. Digital video data for a present frame of video is supplied to input terminal 61, which supplies the digital video data to a motion vector detecting circuit 62 and a subtracting circuit 63.
The motion vector detecting circuit 62 detects a motion vector for a block of the present frame relative to a reference frame, which may be a frame that temporally precedes the present frame, and supplies the motion vector to a motion compensating circuit 64.
Frame memory 65 is adapted to store an image such as the preceding frame which, when motion compensated, forms the prediction for the present image, and to supply this image to the motion compensating circuit 64.
The motion compensating circuit 64 is operative to perform motion compensation of the image supplied thereto from frame memory 65 using the motion vector supplied thereto from the motion vector detecting circuit 62, and to supply the motion compensated image to a subtracting circuit 63 and an adding circuit 66. Specifically, the circuit 64 moves each block of the image to the position indicated by the corresponding motion vector.
The subtracting circuit 63 subtracts the motion compensated preceding frame received from the motion compensating circuit 64 from the video data of the present frame, on a pixel by pixel basis, to produce differential data and supplies the differential data to a DCT circuit 67.
The DCT circuit 67 functions to orthogonally transform the differential data to produce coefficient data, and applies the coefficient data to a quantizing circuit 68 which is adapted to quantize the coefficient data and to supply the quantized coefficient data to an output terminal 69 and to an inverse quantizing circuit 70.
The inverse quantizing circuit 70 recovers the coefficient data from the quantized coefficient data, and applies the recovered coefficient data to an inverse DCT circuit 71 which converts the coefficient data to decoded differential image data and supplies the decoded differential image data to the adding circuit 66.
The adding circuit 66 adds the decoded differential image data to the motion compensated image data from the circuit 64 to produce decoded image data and applies the decoded image data to the frame memory 65 for storage therein.
The operation of motion vector detection performed by the motion vector detection circuit 62 will now be described with reference to FIGS. 2-4.
The motion vector detecting circuit 62 uses a block matching method to detect motion vectors. In the block matching method, an inspection block of a reference frame is moved in a predetermined searching range to identify the block in the predetermined searching range that best matches a base block of the present frame. The motion vector is the difference between the co-ordinates of the base block and the co-ordinates of the best matching block in the reference frame.
FIG. 2A shows an image of one frame comprising H horizontal pixels.times.V vertical lines, which are divided into blocks of size P pixels.times.Q lines. FIG. 2B shows a block in which P=5, Q=5, and "c" represents the center pixel of the block.
FIG. 3A shows a base block of a present frame having a center pixel c and an inspection block of a reference frame having a center pixel c'. The inspection block is positioned at the block of the reference frame which best matches the base block of the present frame. As can be seen from FIG. 3A, when the center pixel c of the base block is moved by +1 pixel in the horizontal direction and +1 line in the vertical direction, the center pixel c is co-located with the center pixel c'. Thus, a motion vector (+1, +1) is obtained. Similarly, for the positions of the best matching block relative to the base block shown in FIGS. 3B and 3C, respective motion vectors of (+3, +3) and (+2, -1) are obtained. A motion vector is obtained for each base block of the present frame.
The predetermined search range through which the inspection block is moved in the reference frame may be .+-.S pixels in the horizontal direction and .+-.T lines in the vertical direction, that is, the base block is compared with an inspection block having a center pixel c' that varies from a center pixel c of the base block for .+-.S pixels in the horizontal direction and .+-.T lines in vertical direction. FIG. 4 shows that a base block R with a center pixel c of a present frame should be compared with {(2S+1).times.(2T+1)} inspection blocks of a reference frame. In FIG. 4, S=4 and T=3. The searching range of FIG. 4 is a region consisting of the centers of each of the inspection blocks. The size of the searching range that contains the entirety of the inspection blocks is (2S+P).times.(2T+Q), i.e., ((P-1)/2+(2S+1)+(P-1)/2).times.((Q-1)/2+(2T+1))+(Q-1)/2).
The comparison of a base block with an inspection block at a particular position in the predetermined search range comprises obtaining evaluating values,such as the sum of absolute values of differential values of frames, the sum of squares of differential values of frames, or the sum of n-th power of absolute values of differential values of frames, detecting the minimum of the evaluating values to identify the best matching block, and producing a motion vector between the base block and the best matching block.
FIG. 5 shows an example of the motion vector detection circuit 62.
Image data for a present frame is applied to an input terminal 81, which supplies the image data to a present frame memory 83 for storage. Image data for a reference frame is applied to an input terminal 82, which supplies the image data to a reference frame memory 84 for storage.
Controller 85 controls reading and writing of the present frame memory 83 and the reference frame memory 84 which respectively supply pixel data of a base block of the present frame and pixel data of an inspection block of the reference frame to differential value detecting circuit 87. An address moving circuit 86 is associated with the reference frame memory 84. The controller 85 controls the address moving circuit 86 to apply read addresses to the reference frame memory 84 which move, pixel by pixel, the position of the inspection block in the predetermined searching range.
The differential value detecting circuit 87 obtains the differential value between the output signals of the present frame memory 83 and the reference frame memory 84 on a pixel by pixel basis and supplies the differential values to an absolute value calculating circuit 88 which obtains the absolute value of the differential values and supplies the absolute value to an accumulating circuit 89. The accumulating circuit 89 sums the absolute values of the differential values for each block to produce an evaluating value for the base block relative to the inspection block at a particular position in the predetermined search range and supplies the evaluating value to a determining circuit 90.
The determining circuit 90 identifies the minimum evaluating value in the predetermined search range. The best matching block in the predetermined search range of the reference frame corresponds to the minimum evaluating value. The circuit 90 also produces a motion vector between the base block of the present frame and the best matching block in the predetermined search range of the reference frame.
The conventional block matching method requires a large amount of hardware and a large number of arithmetic operations. For the situation shown in FIG. 4, (P.times.Q) absolute values of differential values should be summed {(2S+1).times.(2T+1)} times. Thus, the number of arithmetic operations for this process is expressed as {(P.times.Q).times.(2S+1).times.(2T+1)}.
To overcome these disadvantage of the conventional block matching method, various methods have been proposed.
In the first of these proposed methods, to decrease the number of elements of a block, a method for decomposing a base block and an inspection block into small blocks in the horizontal and vertical directions and extracting a feature value for each small block has been proposed. The feature value may be, for example, the sum of the magnitudes of the pixels in the small blocks. The feature values of each of the small blocks in the horizontal direction of each of the base block and the inspection block are compared, and the feature values of each of the small blocks in the vertical direction of each of the base block and the inspection block are compared. Absolute values of the compared results are summed. The weighted mean value of the summed results is used as the evaluating values for the base and inspection blocks. This method, described in detail in U.S. application serial no. 08/283,830, filed Aug. 1, 1994, reduces the number of arithmetic operations to the number of small blocks in the horizontal and vertical directions.
In the second of these proposed methods, to simplify the searching process, in a first stage, the inspection block is moved every several pixels to coarsely detect a motion vector. In a second stage, the inspection block is moved near the position indicated by the coarse motion vector every pixel to finely detect a motion vector. This method is referred to as a two-step method. In addition, a three-step method where a motion vector is obtained in three steps is also known. In the three-step method, the number of arithmetic operations corresponding to all the pixels in the searching range can be reduced to the number of arithmetic operations corresponding to the pixels near the motion vector detected in each step.
Yet another proposed method which both decreases the number of elements of a block and simplifies the searching process, known as the thin-out method, employs a hierarchical construction. The number of pixels in a block is sampled and thinned out (for example, four pixels are thinned out to one pixel, or two pixels are thinned out to one pixel). Blocks constructed of the thinned-out pixels are compared. Thereafter, the origin of the block matching process is moved to the position of the minimum detected value. A motion vector is detected by the block matching process, pixel by pixel. As the result of the thin-out process, both the number of elements in a block and the number of arithmetic operations in the searching range decrease.
A further proposed method which both decreases the number of elements of a block and simplifies the searching process, employs a low pass filter. In this method, a hierarchical construction is defined where there are a first hierarchical stage where an original image is supplied, a second hierarchical stage where the number of pixels of the original image signal in the first hierarchical stage is thinned out by 1/2 in the horizontal and vertical directions with a low-pass filter and a sub-sampling circuit, and a third hierarchical stage where the number of pixels of the image signal in the second hierarchical stage is thinned out by 1/2 in the horizontal direction and vertical directions with a low-pass filter and a sub-sampling circuit. The block-matching process is performed for the image signal in the third hierarchical stage. The origin of the block matching process is moved to the position corresponding to the detected minimum evaluating value. The block matching process is performed for the image signal in the second hierarchical stage. The origin of the block matching process is moved to the position corresponding to the detected minimum value. The block matching process is performed for the image signal in the first hierarchical stage.
A problem with each of the above mentioned proposed methods for reducing the number of arithmetic operations in the block matching method for detecting motion vectors is that each of these methods may produce an erroneous result since the amount of information in an original image is lost due to simplification carried out by the searching process.
Specifically, when the number of elements of a block is decreased, a feature value of a small block that has been passed through a low-pass filter is used. When the searching process is simplified, since a motion vector is coarsely detected, the accuracy is low. Thus, an error may take place. When the number of elements is decreased and the searching process is simplified, since a motion vector is detected corresponding to a thinned-out image or an image that has passed through a low-pass filter, an error may take place.