1. Field of the Invention
The present invention relates to a technology used for inter-frame video encoding, and relates in particular to a motion vector search method and apparatus for determining a motion vector to indicate a movement of a pixel block in one image to a location in another image, and a computer program product to execute motion vector search.
This application is based on patent application No. Hei 10-136287 and Hei 10-202628 filed in Japan, the contents of which are incorporated herein by reference.
2. Description of the Related Art
Various conventional motion vector search methods will be described in the following with reference to schematic diagrams.
FIG. 11 shows a schematic diagram of a template 1 that is selected for obtaining a motion vector and a search area 2 in which to search for a motion vector. Template 1 is a pixel block in a target image to be matched, search area 2 is a pixel block in another image, which is larger than the template 1, to be compared against the target image. White triangles 3 refer to pixels in the template 1 and white circles refer to pixels in the search area 2.
In FIG. 11, template 1 is superimposed over the search area 2, therefore, white circle pixels 4 that are overlapped in the search area are not indicated. In the following presentation, horizontal pixels are referred as pixels, and vertical pixels are referred as lines, so that template 1 shown in FIG. 11 is described as 4 pixelsxc3x974 lines, and the search area 2 is described as 11 pixelsxc3x9711 lines.
A first conventional method of searching for a motion vector using template 1 is the exhaustive search method. In this method, a template 1 comprised by 4 pixelsxc3x974 lines is used to successively search a search area 2 that is larger than the template size, and the values of each pixel in the sampled areas are calculated with the values of the corresponding pixels in the reference image, in terms of some search parameter such as the absolute value of the differences or squares of differences, and a sum of the chosen search parameter of each pixel in the sampled areas is obtained for each sampled area.
Specifically, the template 1 begins a search from left top to the right bottom in the search area 2 by shifting one pixel or by one line for each sampling, and since the search area 2 is comprised by 11 pixelsxc3x9711 lines, and the template is 4 pixelsxc3x974 lines, a total of 8xc3x978=64 sample areas are subjected to analysis in terms of the absolute value of the differences or squares of the differences between the template 1 and the sample area. A motion vector is then determined and a location in a target area having the minimum computed value of the chosen search parameter.
In FIG. 11, if it is supposed that the motion vector of a pixel block of template 1 is described by the zero motion vector (0, 0), then, a possible range of values that the motion vector can have is [xe2x88x924xcx9c+3] in both horizontal and vertical directions, respectively. This is because, as can be seen in FIG. 11, template 1 is able to move four pixels (or four lines) in one direction (left and top) and similarly, template 1 is able to move three pixels (or three lines) in the opposite direction (right and bottom).
And, for example, if it is assumed that the minimum computed value (sum of absolute value of differences or sum of squares of the differences) is obtained in pixel block 5, then the motion vector V is (xe2x88x922, xe2x88x922). This is a numerical illustration of the principle of obtaining a motion vector, but a general approach to vector search will be explained with reference to FIG. 12.
The size of a template 6 in a target image is described by xe2x80x9ca pixelsxc3x97b linesxe2x80x9d, and the size of search area 7 in a comparison (reference) image is described by xe2x80x9cc pixelsxc3x97d linesxe2x80x9d, where cxe2x89xa7a, dxe2x89xa7b. Suppose that the motion vector of the center pixel block in the search area 7 corresponding to template 6 is [(0, 0)], then, possible range of motion vector 6 is given by [xe2x88x92(cxe2x88x92a+1)/2xcx9c(cxe2x88x92a)/2 horizontal; xe2x88x92(dxe2x88x92b+1)/2xcx9c(dxe2x88x92b)/2 vertical], and it is necessary to evaluate [(cxe2x88x92a+1)xc3x97(dxe2x88x92b+1)] pieces of motion vectors.
In the case shown in FIG. 12, a motion vector for the pixel block 8 at the left upper corner is described by [xe2x88x92(cxe2x88x92a+1)/2, xe2x88x92(dxe2x88x92b+1)/2)], and a motion vector for the right lower corner pixel block 9 is described by [(cxe2x88x92a)/2, (dxe2x88x92b)/2]. Motion vector search parameters referred to in the following presentation are always those that are determined based on a sum of the absolute values of differences. It is, of course, permissible to choose a sum of squares of differences as the search parameter.
When the exhaustive search method is employed, the search area 2 shown in FIG. 11 requires an absolute value of differences to be calculated 64xc3x9716=1024 times (sampling 64 pieces of areas each comprised by 16 pixels), requiring a vast amount of computation. So, to reduce the amount of computation, a second conventional technique of determining a motion vector is known. First, movements are evaluated on a reduced size image obtained by sub-sampling, and then the motion vector thus obtained is used as an initial value of the motion vector to carry out further searches in small areas only.
The second conventional technique will be explained with reference to FIG. 13. FIG. 13 shows those pixels 10, 11 to be matched using sub-sampling size of 2 pixelsxc3x972 lines, where black triangles 10 represent pixels in template 1 in a reference image and black circles 11 represent corresponding target pixels in the search area 2 to be sampled.
First, a reduced image is obtained by sub-sampling, and an absolute value of differences is calculated 16xc3x974=64 times (16 sub-sampling of an area having 4 pixels) in the reduced image. Compared with the first conventional technique, the amount of processing required in this technique is reduced by {fraction (1/16)}. However, to search a range of [xe2x88x921xcx9c+1] pixels about the initial values in the reference (comparison) image, absolute value of differences must be computed 9xc3x9716=144 times. Therefore, the total number of computations is 208 (=64+144), and the total number of computations is reduced considerably compared with the exhaustive search technique.
Details regarding motion vector search techniques are described, for example, in xe2x80x9cA bidirectional motion compensation LSI with a compact motion estimatorxe2x80x9d, by N. Hayashi, et. al., IEICE Trans. Electron, E78-C, 12, pp. 1682-1690, Dec., 1995.
By combining the second conventional technique and the most advance LSI technology, it is possible to implement a search range of [xe2x88x92128xcx9c+128 horizontal; xe2x88x9264xcx9c+64 vertical] pixels in one integrated device to obtain good quality video images of normal activity levels, and thereby realizing low manufacturing cost.
However, for moving images at higher speeds such as sports broadcasting, it is necessary to carry out searches over a range of [xe2x88x92200xcx9c+200 horizontal; xe2x88x92100xcx9c+100 vertical] pixels. Such a motion vector search apparatus capable of searching over a wide area is described in a reference, for example, E. Ogura et. al., xe2x80x9cA 1.2 W Single-Chip MPEG2 MP@ML Video Encoder LSI including Wide Search Range Motion Estimation and 81 MOSP Controllerxe2x80x9d, ISSCC98 Digest of Technical Papers, February., 1998. Incidentally, search range in this reference is [xe2x88x92288xcx9c+287.5 horizontal; xe2x88x9296xcx9c+95.5 vertical] pixels.
To realize such a vast search area, a technique may be considered in which one pixel is extracted from 4 pixelsxc3x972 lines or 4 pixelsxc3x974 lines to obtain a reduced image having a reduced resolution. However, the lesser the resolution the higher the error for motion vectors so that the image quality is inferior in images depicting small movements, compared with images obtained by searching over a small area using either the first or second technique.
For this reason, a third technique of searching in a wide area has been suggested, in which the wide search area is divided into several smaller areas and a separate motion vector search apparatus is allocated for each divided individual area.
This technique will be explained with reference to FIG. 14. FIG. 14 is a schematic diagram showing a template 13 to be used for obtaining motion vectors and a target search area 12 to search for motion vectors. Motion vectors can have a range of [xe2x88x928xcx9c+7] pixels in both horizontal and vertical directions, and the search area 12 has 19 pixelsxc3x9719 lines so that the template 13 must evaluate 16xc3x9716=256 search areas, and compute absolute values of differences, in an area four times the area required in FIG. 11.
In this technique, the motion vector search apparatus is required to have four times the computational power compared with the case such as possible values of motion vector are in a range of [xe2x88x924xcx9c+3] pixels in the horizontal and vertical directions, respectively. If the total area is divided in four areas 14, 15, 16, 17 whose edges are overlapped, and if the four areas are searched by four separate apparatuses, then four apparatuses, each having a capability of computing sum of an absolute value of differences in 64 areas, can compute sum of an absolute value of differences in 256 areas.
Details of this technique are described, for example, in xe2x80x9cA family of VLSI Designs for the Motion Compensation Block-Matching Algorithmxe2x80x9d, by K. Yang et. al., IEEE Trans. CAS, pp.1317xcx9c1325, Oct., 1989.
However, in the third technique, as the search area increases, computational power must be correspondingly increased, resulting in increased operational cost. So, a fourth conventional technique has been proposed so that the apparent search area can be increased without increasing the operational cost.
The fourth conventional technique is based on that it would be possible to estimate the movement of the overall image with some probability, by examining the distribution of motion vectors that have been encoded in the past, because moving images on the screen are continual. In such a case, the center of search can be shifted towards the estimated movement direction. This method will be explained with reference to FIG. 15.
FIG. 15 shows a process of searching by anticipating that the overall image movement will be in the direction of (xe2x88x922, xe2x88x923) are searched, so that areas centered about a pixel block 19 corresponding to a motion vector (xe2x88x922, xe2x88x923). In other words, searches for a motion vector will be conducted in a search area over a range of [xe2x88x926xcx9c+1] horizontal and [xe2x88x927xcx9c0] vertical in relation to a reference pixel block 13. In relation to the shifted pixel block 19, the search area 18 is the same as searching an area over a range of [xe2x88x924xcx9c+3] in horizontal and vertical directions. This technique allows searches over an apparent wide area without increasing the computational requirement, when the moving images are moving in one direction such as towards the right or top in continual images.
Details of this method are described, for example, in xe2x80x9cVideo Quality Improvement by Search Window Shiftingxe2x80x9d, by S. Zhu et. al., Proceeding of the 1997 IEICE General Conference, p. 306, 1997.
Furthermore, although discussions so far have dealt only with computational volume, there is also another problem that when the search area is expanded the number of pixels is correspondingly increased. A typical configuration of a motion vector search apparatus is shown in FIG. 16.
Data on the template and search area are stored in an image memory 24 external to the motion vector search apparatus 20, and are transferred to template memory 22 and search area memory 23 in the apparatus 20, when they are needed for movement detection, and they are forwarded to a processing device 21 for computing a sum of the absolute values of differences and a minimum detected value.
Normally, internal pixel transfer rate is high within the motion vector search apparatus 20, i.e., between the template memory 22 and the processing device 21 and between search area memory 23 and the processing device 21, but external transfer rate is slow, i.e., between the template memory 22 and image memory 24, and between search area memory 23 and image memory 24.
In designing a motion vector search apparatus 20, data transfer rate between the template memory 22 and image memory 24 presents no particular problem, because the number of pixels in a template is low, but it is necessary to minimize the volume of data transferred between the image memory 24 and the search area memory 23, because of the vast number of pixels involved in the search areas.
For this reason, normally a fifth conventional technique is used to reduce the volume of image data to be transferred, and this video encoding technique will be explained with reference to FIG. 17.
In video encoding, an image is divided in the vertical direction into so-called slices, having the same size vertically as a template to be used. A slice is further divided in the horizontal direction into the template size. Encoding is performed from the top slice to the bottom slice in an image, and within each slice, encoding is performed from a left template to a right template successively. Therefore, there is considerable duplication of search areas between two adjacent templates.
For example, search areas for the templates 27, 28 in a slice 26 shown in FIG. 17 are respective areas 29, 30, and when the detection process for a motion vector for template 27 is finished, all the pixels in the search area 29 are stored in the search area memory 23 shown in FIG. 16. Subsequent to detection of motion vectors using template 27, suppose that motion vector searches using template 28 is performed and that pixels from the newly searched area 31 are transferred to the memory, all the pixels in the search area 30 for the template 28 will be memorized in the search area memory 23. In other words, if searches are conducted successively from left to right templates, there is no need to transfer all the pixels in the respective search areas of individual templates from the image memory 24 to search area memory 23, but to transfer only those pixels in the search area that are not duplicated by the left adjoining template.
In FIG. 17, pixel positions in the vertical direction in search area 29 and search area 30 are actually coincident, but in the diagram, the search areas 29, 30 are shown shifted so as to clearly indicate the duplicated area.
The fifth technique explained above is quite compatible with the fourth technique. This is because, when all the search areas of the templates contained in a slice are shifted by the same amount, the fifth technique can be applied easily.
However, the fourth technique developed to increase the search area while keeping the cost low by utilizing historical estimation of motion vectors, can present problems when the movement of currently encoded images suddenly changes, resulting that a motion vector could not be detected in narrow search areas. If a vector is unable to be found in any narrow areas, an amount of shift to be used in the subsequent searches cannot be determined. For example, after a scene change, if the video images consist of images whose motion vectors exceed the narrow area, then, even if the images are moving in the same direction at the same speed, the initial amount of pixels to be shifted cannot be determined, resulting that no subsequent movement can be estimated.
Further, in the fourth conventional technique, the same shift is applied to the overall image so that if there is a different type of movement within the image, the technique is not applicable. For example, in an enlarged view of a pendulum, the support side of the pendulum moves slowly while images closer to the tip of the pendulum will move faster. In another example, in a video sequence showing a moving train, the background is stationary while the train moves horizontally in a given direction. In such images, if it is desired to detect motion vectors in as small a search area as possible, the search area must be shifted by different pixel amounts even within one image.
Further problem in the fourth technique is that, if it is detected that a current movement is different compared with historical movements, as a result of searches in a local region of an image, this change cannot be reflected in the remaining search areas in the same image.
It is an object of the present invention to provide a method and an apparatus to enable enlarging search areas for motion vectors without increasing the cost of the apparatus, and to ensure detection of a movement within the wide search area and determine an amount of pixel shifting required for detecting a motion vector by searching in a wide search area for movements in a video image currently being encoded, without depending on historical encoded video images.
The object has been achieved in an apparatus designed especially for a method comprising the steps of: 1) searching in a wide area and obtaining a reference vector to indicate an overall image movement; 2) searching in narrow areas centered about the reference vector for every template in a reference image that produced the reference vector, and obtaining a displacement vector in relation to the reference vector for each template; and 3) summing the reference vector and the displacement vector for each template, and assigning a computed result as a motion vector for a respective template.
Accordingly, it is possible to enlarge the search area without increasing the cost of the apparatus, and to ensure finding a motion vector if it is within the enlarged search area.
Also, by detecting a reference to indicate an overall image movement, and searching in detail about the reference vector, disparity in motion vectors caused by erroneous detection can be suppressed, thereby improving the quality of encoded images.
The object has also been achieved in another version of the apparatus especially designed for another method, comprising the steps of: 1) evaluating, in step 1, a movement of a whole template group containing a specific number of templates, and outputting a detected movement when an overall image movement of a whole template group is detected, and outputting a no-detection indicator when an overall image movement is not detected; 2) shifting, in step 2, a search center of a next template group according to the detected movement when the overall image movement is detected; and 3) assigning a specific motion vector to a center block in a search area to be searched by a next template group when the overall image movement is not detected.
Accordingly, it is possible to enlarge the search area without increasing the cost of the apparatus, and to ensure finding a motion vector if it is within the enlarged search area. It is also possible to adjust the amount of shift of search areas to reflect the search results obtained from a portion of an image.
Also, by evaluating an overall image movement produced by a whole template group and shifting the search area according to the overall image movement, disparity in motion vectors caused by erroneous detection can be suppressed, thereby improving the quality of encoded images.