The present invention relates to image processing apparatuses, image processing methods, recording media, and programs, and more particularly, to an image processing apparatus, an image processing method, a recording medium, and a program which are suitable for a case in which a motion vector is detected among a number of images consecutively captured.
When a video camera is held by a hand or hands and is used for image capturing without securing it by a tripod or others, captured motion images may vibrate horizontally and vertically due to shaking of the hand(s). Thus, it is difficult for the viewer to see the images distinctly when they are reproduced. As a countermeasure to solve this problem, a so-called stabilizer processing, which converts vibrating motion images to non-vibrating motion images, has been known.
The stabilizer processing will be described below by referring to FIG. 1 through FIG. 6. FIG. 1 to FIG. 3 show images P1 to P3 captured at timing t1 to t3 among motion images vibrating due to shaking of a hand or hands. The motion images indicate a case in which a person moves from the left-hand side to the right-hand side in a room provided with a desk and a door which are unmoved.
In the stabilizer processing, an area (for example, an area which includes an edge of a still object, such as the desk) having a predetermined size and a feature is first specified in the image P1 (hereinafter, called a reference image P1) captured at timing t1, shown in FIG. 1. For example, an area R1 enclosed by a dotted line and located at a position (X1, Y1) measured from the origin (0, 0) of the reference image P1 is specified. The specified area R1 is hereinafter called a reference area R1.
Next, a matching area (area M1 in FIG. 2) corresponding to the reference area R1 in the reference image P1 is detected by a block matching method in the image P2 (hereinafter, called a comparison image P2) captured at timing t2, shown in FIG. 2.
In the block matching method, the entire comparison image P2 is searched for an area having the minimum value of the sum of square errors or the sum of absolute errors between pixels in the reference area R1 and the corresponding pixels, or having the maximum value of the normalized cross correlation between the pixels in the reference area R1 and the corresponding pixels to detect the matching area M1 corresponding to the reference area R1.
To reduce the amount of calculation in the block matching method, an area smaller than the entire comparison image P2 may be searched. When it has been found that a motion vector between images corresponds to about 10 pixels, for example, an area larger than the reference area R1 by about 10 pixels horizontally and vertically is to be searched.
Details of the block matching method are described, for example, in “Matching,” Chapter 8.3 of “Digital Image Processing” supervised by Makoto Nagao and published by Kindai Kagaku Sha Co., Ltd.
The detected matching area M1 is located at (X2, Y2) measured from the origin (0, 0) of the comparison image P2. In the same way, a matching area (matching area M2 shown in FIG. 3) corresponding to the reference area R1 is detected by the block matching method in the image P3 (hereinafter, called a comparison image P3) captured at timing t3, shown in FIG. 3. The matching area M2 is located at (X3, Y3) measured from the origin (0, 0) of the comparison image P3.
Then, a motion vector V12 (X1-X2, Y1-Y2) between the reference image P1 and the comparison image P2, and a motion vector V13 (X1-X3, Y1-Y3) between the reference image P1 and the comparison image P3 are calculated. The reference image P1 is set to a compensated image P1, as is, the comparison image P2 is shifted by the motion vector V12 (X1-X2, Y1-Y2) to form a compensated image P2′, and the comparison image P3 is shifted by the motion vector V13 (X1-X3, Y1-Y3) to form a compensated image P3′.
As described above, by the stabilizing processing, the reference image P1 and the comparison images P2 and P3 are converted to the compensated images P1′ to P3′ shown in FIG. 4 to FIG. 6. As shown in FIG. 4 to FIG. 6, in the compensated images P1′ to P3′, still objects, such as the desk, a chair, and the door, are located at identical locations. Actually, the original motion images other than the images P1 to P3 are also converted in the same way. Therefore, when a number of converted consecutive images are reproduced, these motion images do not vibrate.
When the area R1 is set to a reference area in the reference image P1, matching areas corresponding to the reference area are detected in comparison images by the block matching method to obtain appropriate motion vectors. When an inappropriate area is set to a reference area in the reference image P1, however, a matching area corresponding to the reference area in the reference image P1 may be undetected in a comparison image. In such a case, a motion vector between the reference image P1 and the comparison image cannot be obtained, and vibrating motion images cannot be compensated.
It is assumed, for example, that an area R2 is set to a reference area in the reference image P1. The reference area R2 includes a feature portion (also called a foreground), such as an edge of the still desk, and the other portion (also called a background). Since a matching area in the comparison image P2, which is to be detected as an area corresponding to the reference area R2 in the reference image P1, includes a moving person, that area cannot be detected as a matching area corresponding to the reference area R2.