Conventionally, searching for corresponding points in a plurality of images of an object when viewed from different viewpoints has been regarded as an important technique in various fields such as image sensing, image/video signal processing, and computer vision. In these fields, a pixel-accuracy matching technique is often used. Recently, however, there have been increasing demands for a subpixel-accuracy matching technique.
For example, a subpixel-accuracy matching algorithm is indispensable to achieve a sufficient three-dimensional measurement accuracy in a stereo vision system with a short baseline length. In addition, a subpixel-accuracy matching algorithm is also important for a video resolution enhancement technique based on a super-resolution technique. For this reason, for example, the stereoscopic image measuring apparatus disclosed in reference 1 (Japanese Patent Laid-Open No. 10-132534) meets the requirement for subpixel-accuracy matching by searching for corresponding points in a plurality of images of an object when viewed from different viewpoints using a two-dimensional phase-only correlation method.
FIG. 20 is a schematic view of an image input unit in the stereoscopic image measuring apparatus disclosed in reference 1 described above. Referring to FIG. 20, reference numeral 1 denotes a first camera; and 2, a second camera. Reference symbol M denotes an object (human face). The cameras 1 and 2 are arranged side by side in the horizontal direction with the distance between lenses LN1 and LN2 being represented by L. For easy understanding, FIG. 20 shows the cameras 1 and 2 viewed from above and the object M viewed from the side.
This stereoscopic image measuring apparatus captures an image of the object M as an input image I by the camera 1 and divides the image data of the input image I into m×n local regions I(i, j). The apparatus then cuts out the local region I(i, j) from the image data of the input image I, and obtains Fourier image data (input Fourier image data) by performing two-dimensional discrete Fourier transform (DFT) for the image data of the cut local region I(i, j).
The apparatus also captures an image of the object M as a reference image J by the camera 2 and obtains Fourier image data (reference Fourier image data) by performing two-dimensional discrete Fourier transform for the image data of the reference image J.
The apparatus then combines the obtained input Fourier image data and the reference Fourier image data, normalizes the amplitude component of the combined Fourier image data (composite Fourier image data), and performs two-dimensional discrete Fourier transform (or two-dimensional discrete inverse Fourier transform) again.
The apparatus obtains the intensity (amplitude) of the correlation component of each pixel in a predetermined correlation component area from the composite Fourier image data having undergone this two-dimensional discrete Fourier transform, and sets, as a position Pa1 of a correlation peak, the position of a pixel having the highest intensity in the correlation component area.
In this case, a distance A from a center P0 of the correlation component area to the position Pa1 of the correlation peak indicates the shift amount between the local region I(i, j) in the image data of the input image I and a given region (corresponding region) in corresponding image data in the image data of the reference image J. The position of an image in the local region I(i, j) in the input image I shifts from an image in the corresponding region in the reference image J due to parallax. This shift appears as a shift amount A.
Based on this shift amount A, the apparatus matches the center point (reference point) of the local region I(i, j) in the input image with the center point (corresponding point) of the corresponding region in the reference image J, and calculates a distance R from each camera to the corresponding point (reference point) of the object M according to equation (1) based on the triangulation principle. Note that in equation (1), f is the distance from the center of a lens LN (LN1, LN2) to the image capturing position, and L is the inter-lens distance.R=f·L/A  (1)