1. Field of the Invention
The present invention relates to a motion image data coding technique and, in particular, to a method of efficiently detecting motion vectors included in motion image data.
2. Description of the Related Art
When transferring motion image data through a communication medium for a video conference, or recording them on a DVD (Digital Versatile Disk) or the like, the data are normally coded in order to reduce the size.
Normally, motion image data are composed of a plurality of image frames and it is well known that efficient coding is possible by utilizing temporal correlation. For example, if there are two image frames #1 and #2 obtained at time t1: and the t2, respectively, efficient compression can be performed by compressing instead of the image frame #2 an image frame #12 representing difference data between the frames #1 and #2. This is because the image frame #12 is data produced by utilizing the temporal correlation between the image frames #1 and #2 and that has small values upon normal image movements.
Nowadays, there have been proposed various methods to get higher compression ratios by using efficient coding techniques.
One of the known techniques for raising the compression ratio is a technique that detects motion vectors between contiguous image frames and codes the vectors. Sample implementation of this technique is done by dividing the image frame #2 into small blocks, i.e. 16xc3x9716 square blocks, and for each of the blocks, searching for a small area in the image frame #1 that resembles the block, and calculating difference data between the block and the small area in the image frame #1. Then, the mean absolute of the difference data is calculated. Because this mean absolute is smaller than that of the image frame #12, more efficient coding is possible. Although an offset from the detected small area should be coded as a xe2x80x9cmotion vectorxe2x80x9d with the difference data in this case, data size after coding will almost always be smaller than that which does not use the motion vector.
The detection of an offset between a block in the image frame #2 and a small area in the image frame #1 that resembles the block in the image frame #2 is called xe2x80x9cmotion vector detectionxe2x80x9d. Specifically, a motion vector_mb_is detected such that the sum of each value calculated by an evaluation function applied to a pixel in a square block G2 in the image frame #2 and the corresponding pixel in a square block G1 in the image frame #1 is minimum. Normally, the absolute or square of a difference is used as an evaluation function.
Now, the primary problem on the motion vector detection is the method of searching for a block in an image frame corresponding to the subject block in another image frame. One of the well-known methods is a so-called xe2x80x9cfull search methodxe2x80x9d.
The full search method uses an error evaluation function to express the similarity between two pixels, that is, it regards a block whose sum of values of an error evaluation function is minimal as a block that most resembles the subject block. The error evaluation function is expressed by ferror(pixel 1, pixel 2), namely, the absolute value of a difference between pixel 1 and pixel 2 (|pixeI 1xe2x88x92pixel 2 |) or the square of a difference ((pixel 1xe2x88x92pixel 2)xc3x97(pixel 1xe2x88x92pixel 2)). Now let (x, y) be a position in an image frame and A be a square block containing the position (x, y). And let A(bx, by) be a pixel in the block A. On the other hand, let B be a block in the image frame in which motion vectors are to be searched, let (x+dx, y+dy) be a point in B that corresponds to the point (x, y) in A, and let B(bx, by) be a pixel in B. And let E(dx, dy) be a sum of values, each value being calculated by an error evaluation function whose parameters are a point in A and the corresponding point in B. Then the sum E(dx, dy) can be expressed as the following equation:
E(dx, dy)=xcexa3bx(xcexa3by(ferror(A(bx, by), B(bx, by)))
If the range of dx is from dx1 through dx2, and the range of dy is from dy1 through dy2, the full search is done by searching the square defined by (dx1, dy1) and (dx2, dy2) in order to find a motion vector.
Although the full search is supposed to achieve the precise motion vector detection, it has the following problems.
First, calculation will be extensive. To keep up with faster movements, the searching area should be expanded. That is to say, the search area should be wide enough to contain the actual motion vector for the precise detection, which requires a huge amount of calculation.
Second, the search result depends on the search order. That is, since the sums of values of the error evaluation function may have the same value at different points, the point to be chosen win depend on the search order. As a result, it is possible that a motion vector that is not reflecting the actual movement is found. Especially, if the image texture is flat or a repeating pattern exists, it is more likely for the sums of values of the error evaluation function to have similar values at different points. In this case, a square block that resembles the reference block will be found, which is far from the actual movement. The motion vector thus detected does not reflect the actual movement.
Furthermore, the full search is carried out assuming the image""s simple parallel movement and no noise. But actual moving images are always transforming and there are some cases in which noise comes in. In this case, the motion vector that gives the minimum value of a mean error evaluation function does not necessarily reflect the actual movement. Especially, if a search area is expanded in order to catch hold of fast moving objects, the chance of erroneous detection will be increased. In this case, the erroneous vector may be a considerably long vector. As the result, the difference may be small but the total code length may be long, because the vector should also be coded at the time of data coding. Thus, the coding efficiency is decreased.
Since it is difficult for the full search method to search a wide area and detect the correct motion vector, there are considered some other methods which can reduce calculation, such as a subsampling search method, a sparse position search method, or a hierarchical search method using reduced resolution.
When calculating a value with a mean error evaluation function, the subsampling search method picks some representative points Instead of every point for the calculation. For example, assuming that the size of a square block is 16xc3x9716 pixels, it does not use every point (256 pixels) for the calculation. Instead it uses only a half of them (128 pixels) or a quarter of them (64 pixels), in this method, the target pixels are subsampled so that the precision of motion vector detection will be less than the full search. Especially, if the value of an error evaluation function at a skipped pixel is considerably different from the average, miss detection will occur.
The sparse position search method does not implement point by point search in a search area. Instead, it detects a motion vector candidate from a coarse search area first, then, gradually raising the precision. For example, it first searches every four pixels to get a motion vector candidate A, then from the motion vector candidate A, it searches every two pixels to get a motion vector candidate B, and finally it searches point by point around the point B to get a final motion vector C.
Since this method is likely to overlook changes of the space around edges, the precision of motion vector detection is less than the full search. Moreover, since this method restricts motion vectors at the point of coarse search, it cannot detect the correct motion vector once the wrong motion vector candidates are chosen.
It is possible to use a combination of the sparse position search method and the subsampling search method. In this case, calculation is further reduced but the possibility of miss detection Will be raised.
The hierarchical search method using reduced resolution detects a motion vector candidate using an image with reduced resolution, then gradually lowers the reduction ratio to get more precision. For example, a motion vector A is first obtained from the image reduced to a quarter of the original image. Then, a motion vector B is obtained from the image reduced to a half of the original image by searching, from the point that is relatively the same as the motion vector A. Finally, a motion vector C is obtained from the original image by searching from the point that is relatively the same as the motion vector B.
Considering the amount of calculation, this method is similar to the combination of the sparse position search method and the subsampling search method, because using the reduced resolution has the same effect as that of sampling search points and pixels. especially, if a simple sampling method is applied to generate an image with reduced resolution, the calculation amount is exactly the
same as that of the combination of the sparse position search method and the subsampling search method.
With this method, a minute texture of the original image will be lost by the image reduction, so that the precision of motion vector detection is less than the full search on the image that includes the minute texture. Also, the precision of motion vector detection depends on the way of generating the reduced image. If the reduced image is generated not by the simple sampling method but by a method in which, for example, a half size image is generated by calculating the average of weighed 2xc3x972 pixels or more pixels to get a pixel in the reduced image, the motion vector detected by this method is more precise than that by the combination of the sparse position search method and the subsampling search method. This is because every pixel is referred to upon image reduction, while the subsampling search method simply ignores a half or more pixels. The similar argument can be given on the sparsc position search method, i.e. while the sparse position search method may overlook edge information, this method may not. This is because the hierarchical search method using reduced resolution uses the full search on the reduced resolution, while the sparse position search method simply ignores intervening search points.
Nevertheless, like the sparse position search method, the detected motion vector does not always give the minimum value of the mean error evaluation function, because this method first restricts motion vector candidates at coarse searching points. Also, if a wrong motion vector is chosen before raising precision, the correct motion vector can not be detected afterwards.
It is possible to use the hierarchical search using reduced resolution together with the sparse position search method and/or the subsampling search method. In this case, the amount of calculation will be further reduced, but the merit of the precision of motion vector detection will also be lost. That is, the amount of calculation and the detecting precision are in a trade-off relation.
In any cases, it requires a fairly large amount of calculation to get a certain precision on detection. For this reason, a custom LSI that is able to do parallel processing has been used up to now rather than software which is read by and runs on a CPU or MPU in order to detect motion vectors. Nevertheless, since the processing speed of CPU""s is becoming higher and higher every year, implementation by software which can enable a more flexible processing form is emerging. Even if not thoroughly done by software, some layers, for example, all but the final stage can be done by software. In this case, the final stage at which searching is done pixel by pixel will be implemented on hardware.
The current obstacle of motion vector detection done by software is that the real-time processing can not be assured because of the huge amount of calculation for detecting corresponding sets of pixels between image frames. Conversely, if too much weight is put on the real-time processing, the detecting precision will considerably be decreased. Especially, if the full search is used while assuring the real-time processing, a very limited area can only be searched.
Therefore, it is an object of the present invention to provide an improved motion vector searching technique which makes use of the merits of the foregoing searching methods so as to keep a wide searching area while reducing the chance of detecting a wrong motion vector at the same time, and further provide an improved method that enables the implementation by software.
According to one aspect of the present invention, there is provided a motion vector detecting method comprising the steps of, when detecting motion vectors between a first and a second image frame which are contiguous:
(1-1) dividing the second image frame into blocks;
(1-2) classifying the blocks into a first and a second group such that the mutually adjacent blocks belong to the different groups:
(1-3) specifying a motion vector candidate for each block in the first group based on a mean error evaluation function between the first and second image frames;
(1-4) assigning an average of the motion vector candidates of the adjacent blocks in the fist group to a motion vector candidate for each of the blocks in the second group;
(1-5) updating the motion vector candidate of each of the blocks in the first group by assigning thereto an average of the motion vector candidates of the adjacent blocks in the second group.
By using the foregoing method, the amount of calculation can be reduced as compared to a method that calculates a mcan error evaluation function for each of the blocks in the second group.
According to another aspect of the present invention, there is provided a motion vector detecting method comprising:.
(2-1) a pre-process step of capturing a first and a second image frame which are contiguous, holding the first image frame, dividing the second image frame into blocks, and classifying the blocks into a first and a second group such that the mutually adjacent blocks belong to the different groups;
(2-2) an initial search step of specifying a motion vector candidate for each of the blocks in the first group that points to a search point in the first image frame by choosing a vector from the corresponding block that gives a minimal value of a mean error evaluation function, and specifying a motion vector candidate for each of the blocks in the second group by assigning thereto an average of the motion vector candidates of the adjacent blocks In the first group;
(2-3) a first detailed search step of updating the motion vector candidate of each of the blocks in the first group by studying an average vector represented by an average of the motion vector candidates of the adjacent blocks in the second group:
(2-4) a second detailed search step of updating the motion vector candidate of each of the blocks in the second group by studying an average vector represented by an average of the motion vector candidates of the adjacent blocks in the first group:
(2-5) a step of setting the updated motion vector candidate as a motion vector for each of the blocks.
It may be arranged that the study of the average vector is performed by moving a search point, to which the average vector points, to a specified direction and updating a current motion vector candidate if the calculated mean error evaluation function value is less than that of the current motion vector candidate.
It may be arranged that if a difference between the current motion vector candidate and the average vector is greater than a predetermined threshold value, the current motion vector candidate is replaced with the average vector and a point to which the average vector points is set as a new starting point of the search point.
It may be arranged that if the value of the mean error evaluation function is not greater than a current convergence error level, the study of the average vector of the next block in the same group is immediately carried out. It is preferable that performing the first detailed search step and the second detailed search step in turn makes it possible to lower the convergence error level. This can gradually increase the vector detection accuracy.
According to another aspect of the present invention, there is provided a motion vector detecting device comprising:
(3-1) an image frame capturing section which captures a first and a second image frame that are contiguous;
(3-2) a pre-processing section that divides the second image frame into blocks and classifies the blocks into a first and a second group such that the mutually adjacent blocks belong to the different groups;
((3-3) a motion vector study section that approximately specifies a motion vector candidate for each of the blocks in the first group and a motion vector candidate for each of the blocks in the second group, and studies each of the motion vector candidates based on an average vector represented by an average of the motion vector candidates of the adjacent blocks so that a value of a mean error evaluation function becomes smaller.
It may be arranged that the motion vector study section approximately specifies the motion vector candidates for all the blocks. and then studies the average vector for each of the blocks in the first group and the average vector for each of the blocks in the second group alternately.
It may be arranged that the motion vector study section studies the average vector of the next block in the same group immediately if the value of the mean error evaluation function is not greater than a predetermined convergence error level, and that the convergence error level is reduced upon repeating the study of the average vector.
According to another aspect of the present invention, there is provided a computer-readable recording medium storing a program which causes a computer to execute the steps of:
(4-1) capturing a first and a second image frame which are contiguous, holding the first image frame, dividing the second image frame into blocks, and classifying the blocks into a first and a second group such that the mutually adjacent blocks belong to the different groups;
(4-2) specifying a motion vector candidate for each of the blocks in the first group that points to a search point in, the first image frame by choosing a vector from the corresponding block that gives a minimal value of a mean error evaluation function:
(4-3) specifying a motion vector candidate for each of the blocks in the second group by assigning thereto an average of the motion vector candidates of the adjacent blocks in the first group;
(4-4) updating the motion vector candidate of each of the blocks in the fist group by studying an average vector represented by an average of the motion vector candidates of the adjacent blocks in the second group;
(4-5) updating the motion vector candidate of each of the blocks in the second group by studying an average vector represented by an average of the motion vector candidates of the adjacent blocks in the first group;
(4-6) setting the updated motion vector candidate as a motion vector for each of the blocks.