This present application relates to an image processing apparatus and an image processing method for detecting a motion vector between two different screens or screen images. In the present specification, the term “screen” or “screen image” signifies an image formed from image data for one frame or one field and displayed as an image on a display apparatus.
A block matching method of determining a motion vector between two screen images from the image information itself is a technique having a long history. The block matching method has been developed principally in connection with pan-tilt detection or image pickup object tracking of a television camera, dynamic picture coding of the MPEG (Moving Picture Experts Group) method and so forth. After the 1990s are entered, the block matching method is directed to various applications such as sensorless camera shake correction and noise reduction upon low illuminance image pickup through superposition of images.
The block matching is a method wherein a motion vector between two images including a reference screen image which is a noticed screen image and a basic screen image (hereinafter referred to as target screen image) based on which a motion in the reference screen image is to be determined is calculated by calculating a correlation between the reference screen image and the target screen image in regard to blocks of a rectangular region of a predetermined size. The reference screen image and the target screen image may have such a temporal relationship that the target screen image temporally precedes to the reference screen image as in the case of, for example, motion detection by the MPEG method or another temporal relationship that the reference screen image temporally precedes to the target screen image as in the case of, for example, noise reduction by superposition of image frames hereinafter described.
It is to be noted that, while, in the present specification, the screen or screen image signifies an image formed from image data for one frame or one field as described hereinabove, the following description is given assuming that a screen image is formed from one frame and is hereinafter referred to as frame for the convenience of description. Accordingly, a reference screen image is hereinafter referred to as reference frame, and a target screen image is hereinafter referred to as target frame.
FIGS. 51A to 56 illustrate an outline of related-art block matching. In the block matching method described here, as shown in FIG. 51A, for example, a target frame or basic frame 100 is divided into a plurality of rectangular regions called blocks having a predetermined size including a plurality of pixels in a horizontal direction and a predetermined lines in a vertical direction. Each of the plural blocks 102 in a target frame is hereinafter referred to as target block.
In the block matching, a block having a high correlation with a target block 102 is searched for from within the reference frame 101. A block 103 (refer to FIG. 51B) detected as a block which has the highest correlation in the reference frame 101 as a result of the search is called motion compensation block. Further, a positional displacement amount between the target block 102 and the motion compensation block 103 is called motion vector (refer to reference numeral 104 in FIG. 51B).
The motion vector 104 corresponding to the positional displacement between the target block 102 and the motion compensation block 103 including both of a positional displacement amount and a positional displacement direction corresponds to positional displacement between the position (for example, the position of the center) of a projection image block 109 of each target block 102 of the target frame 100 and the position (for example, the position of the center) of the motion compensation block 103 where the projection image block 109 of the target block 102 is supposed to be at a position same as the position of the target block 102 in the reference frame 101. Further, the positional displacement in this instance has a positional displacement amount and a directional component of the displacement.
An outline of a block matching process is described. Referring to FIG. 52, as indicated by a broken line, a projection image block 109 of a target block is supposed to be at a position in the reference frame 101 same as the position of a target block 102 of the target frame 100, and the coordinates of the center of the projection image block 109 of the target block are defined as the origin 105 for motion detection. Then, assuming that the motion vector 104 exists within a range including the origin 105 for motion detection, a predetermined range centered at the origin 105 is set as a predetermined search range 106 (refer to an alternate long and short dash line in FIG. 52).
Then, a block (reference block) 108 of a size same as that of the target block 102 is set on the reference screen image. Then, the position of the reference block 108 is moved in a unit of one pixel distance or of a distance of a plurality of pixels, for example, in a horizontal direction and a vertical direction in the search range 106. Accordingly, in the search range 106, a plurality of reference blocks 108 are set.
Here, to move the reference block 108 in the search range 106 signifies to move the center position of the reference block 108 in the search range 106 because, in the present example, the origin 105 is the center position of the target block. Therefore, some of pixels which compose the reference block 108 sometimes protrude from the search range 106.
Then, for each of the reference blocks 108 set in the search range 106, a vector 107 (refer to FIG. 52) hereinafter referred to as reference vector is set which represents a positional displacement amount and a positional displacement direction between the reference block 108 and the target block 102. Then, the correlation between the image contents of the reference block 108 at the position indicated by each reference vector 107 and the image contents of the target block 102 is evaluated.
Referring to FIG. 53, where the positional displacement amount of the reference block 108 in the horizontal direction or X direction is represented by Vx and the positional displacement amount of the reference block 108 in the vertical direction or Y direction is represented by Vy, the reference vector 107 can be represented as vector (Vx, Vy). When the positional coordinates such as, for example, the center position coordinates of the reference block 108 and the positional coordinates such as, for example, the center position coordinates of the target block 102 are same as each other, the reference vector 107 is represented as vector (0, 0).
For example, if the reference block 108 is at a position displaced by a one-pixel distance in the X direction from the position of the target block 102 as seen in FIG. 53, then the reference vector 107 is vector (1, 0). Meanwhile, if the reference block 108 is at a position displaced by a three-pixel distance in the X direction and by a two-pixel distance in the Y direction from the position of the target block 102 as seen in FIG. 54, then the reference vector 107 is vector (3, 2).
In particular, where the positions of the target block 102 and the reference block 108 are defined as the center positions of the blocks as in the example of FIG. 54, the reference vector 107 signifies positional displacement between the center position of the reference block 108 and the center position of the target block 102, that is, a vector having a positional displacement amount and the direction of the displacement.
While the reference block 108 moves in the search range 106, the center position of the reference block 108 moves in the search range 106. As described hereinabove, since the reference block 108 is composed of a plurality of pixels in the horizontal direction and the vertical direction, the maximum range of the movement of the reference block 108 which is an object of a block matching process with the target block 102 is a matching processing range 110 which is greater than the search range 106 as seen in FIG. 54.
Then, the position of the reference block 108 detected as a block having the maximum correlation with the image contents of the target block 102 is determined as the position after the movement of the target block 102 of the target frame 100 on the reference frame 101, and the detected reference block is determined as the motion compensation block 103 described hereinabove. Then, the positional displacement amount between the detected position of the motion compensation block 103 and the position of the target block 102 is detected as the motion vector 104 as an mount including a directional component (refer to FIG. 51B).
Here, the correlation value representative of the magnitude of the correlation between the target block 102 and the reference block 108 which moves in the search range 106 is calculated basically using corresponding pixel values of the target block 102 and the reference block 108. For the calculation, various methods have been proposed, and one of such methods uses the root-mean-square.
As a correlation value which is popularly used in calculation of a motion vector, for example, the sum total of absolute values of differences between luminance values of pixels in the target block 102 and luminance values of corresponding pixels in the search range 106 in regard to all pixels in the blocks is used (refer to FIG. 55). The sum total of absolute values of differences is called difference absolute value sum and hereinafter referred to as SAD (Sum of Absolute Difference).
Where an SAD value is used as a correlation value, the lower the SAD value, the higher the correlation. Accordingly, the reference block 108 which moves in the search range 106 becomes a highest correlation reference block when it is positioned at a position at which the SAD value exhibits the lowest value. Thus, the highest correlation reference block is detected as the motion compensation block 103, and the positional displacement amount of the detected motion compensation block 103 with respect to the position of the target block 102 is detected as a motion vector.
As described hereinabove, in the block matching, the positional displacement amount of each of the plural reference blocks 108 set in the search range 106 with respect to the position of the target block 102 is represented as a reference vector 107 as an amount including a directional component. The reference vector 107 of each reference block 108 has a value based on the position of the reference block 108 on the target block 102. As described hereinabove, in the block matching, the reference vector of the reference block 108 which exhibits the minimum value of the SAD value as a correlation value is detected as the motion vector 104.
Therefore, in the block matching, as shown in FIG. 56, generally such a detection method as described in the following is adopted. In particular, SAD values between a plurality of reference blocks 108 set in the search range 106 and a target block 102 (such SAD values are hereinafter referred to as SAD values of the reference blocks 108 for simplified description) are stored in a corresponding relationship to reference vectors 107 corresponding to the positions of the reference blocks 108 (such reference vectors 107 corresponding to the positions of the reference blocks 108 are hereinafter referred to as reference vectors 107 of the reference blocks 108 for simplified description) into a memory. Then, that one of the reference blocks 108 which exhibits a minimum SAD value from among the SAD values of all of the reference blocks 108 stored in the memory is detected to detect the motion vector 104.
A memory or memory area in which correlation values such as SAD values of the reference blocks 108 are stored in an individually corresponding relationship to the reference vectors 107 corresponding to the positions of the reference blocks 108 set in the search range 106 is called correlation value table. In the present example, since the SAD value which is a difference absolute value sum is used as the correlation value, the correlation value table is formed as a difference absolute value sum table which is hereinafter referred to as SAD table.
The SAD table is represented as an SAD table TBL in FIG. 56. Referring to FIG. 56, in the SAD table TBL shown, the correlation value, in the example shown, the SAD value, of each reference block 108 is called correlation value table element. In the example of FIG. 56, the SAD value denoted by reference numeral 111 is an SAD value where the reference vector is vector (0, 0). Further, in the example of FIG. 56, since the minimum value of the SAD value is “7” where the reference vector is vector (3, 2), the motion vector 104 to be determined naturally is vector (3, 2).
It is to be noted that the position of any of the target block 102 and the reference blocks 108 signifies an arbitrary particular position in the block, for example, the position of the center of the block, and the reference vector 107 indicates a displacement amount including a direction between the position of the projection image block 109 of the target block 102 and the position of the reference block 108 in the reference frame 101.
Then, the reference vector 107 corresponding to each reference block 108 is a positional displacement of the reference block 108 from the position of the projection image block 109 corresponding to the target block 102 on the reference frame 101, and therefore, if the position of the reference block 108 is specified, then also the value of the reference vector is specified in accordance with the position of the reference block 108. Accordingly, if the address of a correlation value table element of a reference block in the memory of the SAD table TBL is specified, then the corresponding reference vector is specified.
It is to be noted that the SAD value may be calculated simultaneously for two or more target blocks. If the number of target blocks to be processed simultaneously increases, then the speed of processing increases. However, since the scale of hardware for calculating the SAD value increases, the increase of the speed of processing and the increase of the circuit scale have a trade-off relationship to each other.
Incidentally, in the block matching method described above, as the resolution of an image of an object of processing increases, the number of pixels with regard to which a motion is detected between two images or two screen images increases. Therefore, in order to follow up the motion, it is necessary to use a wider search range for a motion vector, that is, to increase the number of pixels to be included in a search range.
However, where a wider search range is used in this manner, the number of times by which pixel information is to be read in from a frame memory per one block of an object of processing increases, resulting in a problem that increased processing time is required.
On the other hand, where the resolution of an image of an object of processing is low, or where the frame rate is high, since the motion between pixels is small, it is necessary to detect a small motion of a sub pixel accuracy smaller than one pixel. Therefore, it is necessary to use an oversampled image to detect a motion vector thereby to detect a motion vector smaller than one pixel. However, employment of this method gives rise to a problem that the circuit scale is increased and also the processing time is increased by the oversampling.
As described above, the block matching method indicates a tendency that the processing time and the circuit scale increase in response to the requirement for a wide search range and a very small motion less than one pixel. Thus, it is demanded to eliminate such increase of the processing time and the circuit scale.
Meanwhile, in recent years, development of images of high definition dynamic pictures has progressed, and a demand for a higher resolution and higher picture quality of images is increasing. Together with this, a demand for a block matching method which implements both of a wider search range and detection of a very small motion of a sub pixel accuracy less than one pixel is increasing.
In order to solve such problems as described above, various methods have been proposed in the past. For example, a method for achieving efficient processing in detection of a very small motion less than one pixel has been proposed and is disclosed in Japanese Patent Laid-Open No. Hei 7-95585. Another method for reducing a frame memory and the calculation amount by sampling out reference images is disclosed in Japanese Patent Laid-Open No. 2006-160829.
A further method which is considered most practical is disclosed in Japanese Patent Laid-Open No. Hei 5-91492. According to the method, a minimum SAD value in an SAD table and a plurality of SAD tables at neighboring positions with the position of the minimum SAD value in the SAD table are used to carry out an interpolation process so that a minimum value of the SAD value is calculated in a high accuracy lower than the accuracy of the SAD table, that is, the pixel pitch accuracy of a target frame and a reference frame. With the method described, the reference image need not be pre-processed and also the impact of the circuit scale is small.