1. Field of the Invention
The present invention relates to a method and apparatus for motion vector detection and a medium storing a method program directed to the same. More particularly, the present invention relates to: a method for detecting a motion vector for a frame to be encoded which is used in motion-compensated predictive encoding as a compression encoding technique for moving pictures; a motion vector detection apparatus using that method; and a recording medium storing a program for executing the method.
2. Description of the Background Art
Conventionally, methods for providing an improved encoding efficiency in moving pictures encoding are known which employ a so-called motion vector to predict and compensate for a motion in a block (hereinafter abbreviated as an “E-block”) having a predetermined size within a frame to be encoded in an effort to eliminate redundancy regarding time. A “motion vector” refers to information concerning a correlation between an E-block and a given spatial position in a given frame or field constituting a referenced frame. As used herein, a “frame to be encoded” refers to an image frame to be subjected to an encoding process. A “reference frame” refers to an already-encoded image frame which may lie before or after a given frame to be encoded on the time axis. Note that, instead of a frame, a field may also be used as a reference frame, and a plurality of reference frames may be referred to if necessary; in the present specification any allusion to a “reference frame” is intended to cover all such meanings. One method for obtaining a motion vector is a block matching method.
Details of techniques concerning motion-compensated predictive encoding are described in, for example, “JTC1/SC29/WG11 MPEG98/N2172 Tokyo, March 1998”, published by ISO/IEC.
FIG. 21 illustrates the concept of motion vector detection based on the block matching method. As shown in FIG. 21, in a motion vector detection based on the block matching method, pixel-to-pixel differences (e.g., sums of absolute differences) between an E-block to be processed (hereinafter referred to as a “target E-block”) and a plurality of regions (hereinafter referred to as “S-blocks”) each having the same size as that of the E-block, which are demarcated in a search area within the reference frame, are calculated. The S-block which has the smallest difference is detected as a block having the strongest correlation with the target E-block (hereinafter referred to as a “correlated block”). The temporal/spatial offset between the target E-block and the correlated block thereof is detected as a motion vector.
This motion vector detection technique aims to detect a block which is associated with the smallest encoding amount, in order to improve encoding efficiency in motion-compensated predictive encoding. Therefore, in order to detect a block having a strong correlation for images involving substantial motion, the search area should be as large as possible. However, since the motion vector detection processing requires pixel-by-pixel computation (i.e., error calculation, that is, difference calculation) for all of the plurality of areas made within a search area, motion vector detection processing accounts for a large amount of computation in the encoding process. Thus, merely enlarging a search area will necessitate greater-scale hardware.
As a technique for performing block matching while preventing such an increase in the amount of computation, an OTS (One at a Time Search) technique has been proposed. FIG. 22 is a diagram for illustrating the concept of the conventional OTS technique. FIG. 22 is a diagram for facilitating the understanding of the OTS technique, where search positions are shown as non-overlapping regions (shown as circles) centered around representative pixels. Thus, individual circles represent the respective block positions.
According to this OTS technique, regarding a central S-block (chosen at a predetermined position) and four neighboring S-blocks (i.e., S-blocks denoted as “1” in FIG. 22), sums of absolute differences in pixel data with respect to the target E-block are respectively calculated. A “sum of absolute difference” (hereinafter abbreviated as “SAD”) is obtained, with respect to each block (shown as a circle) and the target E-block, as a cumulative sum of absolute values of differences between the corresponding pixels in the two blocks. Next, the SAD in the central S-block and the SADs in the four neighboring S-blocks are compared. If any one of the neighboring S-blocks has an SAD smaller than that of the central S-block, that neighboring S-block is then regarded as a new central S-block, and the SADs of similarly-chosen five S-blocks (i.e., S-blocks denoted as “2”) are again calculated. This process is repeated until the SAD of the central S-block becomes the smallest (i.e., S-blocks denoted as “3” and S-blocks denoted by the subsequent numbers). Thus, an S-block having a strong correlation with the target E-block, i.e., a correlated block can be detected with a relatively small amount of computation.
OTS-related techniques are described in detail in, for example, IEEE, “IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. COM-33, NO. 8, AUGUST 1985, pp. 888–896e”.
However, the aforementioned OTS technique only performs relative comparisons between the SAD of the central S-block and the SADs of the neighboring S-blocks. Therefore, when the OTS technique is used alone, the correlated block may not be properly obtained depending on the content of the pictures. In other words, since searches are only performed in the direction of neighboring blocks having stronger correlations, it may not be possible to accurately detect a motion vector for an isolated S-block (i.e., an S-block marked by a strong correlation but bounded by S-blocks of a weaker correlation) or pictures containing complicated image patterns.
Accordingly, there has been proposed a method which realizes an enlarged search area without necessitating a greater-scale hardware method, where pixels are searched coarsely (i.e., the number of the pixels searched is made smaller, this being generally referred to as “decimated”) with respect to the enlarged portion of the search area, and only the pixels which are thus decimated are subjected to matching calculation, thereby coming to an equal amount of computation per one block method.
However, according to this method, a search must be performed after previously removing high-frequency components to decimate the pixels, and a problem of image quality deterioration may arise with images which are rich in high-frequency components.
There is another motion vector detection method called “two-step search method”. The two-step search method first performs a motion estimation (primary search) with a resolution on the order of an integer number of pixels (hereinafter referred to as a “full-pixel resolution”), and then performs a further motion estimation (secondary search) with a resolution on the order of 0.5 pixels (hereinafter referred to as a “half-pixel resolution”), directed to the vicinity of the pixels which have been detected by the primary search (i.e., pixels having a relatively high correlation). Pixels to be searched (hereinafter referred to as “searched pixels”) according to the two-step search method will be described with reference to FIGS. 23A to 23C. In FIGS. 23A to 23C, too, each circle represents a representative pixel position of a block.
FIG. 23A illustrates an exemplary case where the searched pixels (represented as hatched circles) in a primary search are all pixels. In this case, the searched pixels in a secondary search are a total of 9 pixels: a pixel which has been detected through the primary search (represented as a solid black circle) as well as the surrounding 8 sub-pixels (represented as “X”) with a half-pixel resolution. FIG. 23B illustrates an exemplary case where the searched pixels in a primary search are every other pixel. In this case, the searched pixels in a secondary search are a total of 15 pixels: a pixel which has been detected through the primary search as well as the surrounding 14 sub-pixels with a half-pixel resolution (including pixels which have not been searched in the primary search; represented as blank circles). FIG. 23C illustrates an exemplary case where the searched pixels in a primary search are every fourth pixel. In this case, the searched pixels in a secondary search are a total of 27 pixels: a pixel which has been detected through the primary search as well as the surrounding 26 pixels with a half-pixel resolution (including pixels which have not been searched in the primary search; represented as blank circles).
The two-step search method, which can decimate pixels through the primary search, is very effective for downscaling the primary search circuitry. However, according to the aforementioned two-step search method, the pixels which are not targets in the primary search are targeted in the secondary search, as described above. As a result, as shown in FIG. 23C, if the number of pixels discarded in the decimated search is increased in order to reduce the number of searched pixels in the primary search, the searched pixels in the secondary search are increased, thus requiring upscaling of the secondary search circuitry.
The secondary search in the two-step search method does not need to be performed with a half-pixel resolution. For example, the secondary search may be performed with a resolution on the order of 0.25 pixels (hereinafter referred to as a “quarter-pixel resolution”) to attain a higher resolution. Furthermore, it would be possible to perform a secondary search with a half-pixel resolution and then perform a tertiary search with a quarter-pixel resolution (thus realizing a three-step search method).