In a stereo camera or multi-camera having a plurality of image pickup units, pieces of image information about a subject in respective captured images are compared with each other, thereby measuring three-dimensional information about the subject. At this time, for accurate three-dimensional measurement, it is required to remove influences due to optical characteristics of a zoom lens or focus lens and accurately grasp the focal distance, image center, and shearing of each camera, which are called camera parameters (hereinafter referred to as internal parameters) and geometrical information about the position, azimuth, and others of the camera in a space (hereinafter referred to as external parameters). Thus, a camera information storage unit stores lens distortion correction information or internal parameters varying depending on a zoom magnification and a focus position of each of a first image pickup unit and a second image pickup unit, and therefore stores a plurality of pieces of lens distortion correction information or internal parameters corresponding to combinations of a zoom magnification and a focus position. The camera information can be obtained by performing a camera calibration process in advance. For example, a measurement is performed with an existing camera calibration methodology, for example, a methodology used by Zhang, using a calibration pattern at the time of shipping from a factory or at the time of initial adjustment after installation, and then the camera information is stored in the camera information storage unit. A stereo image processing unit obtains zoom/focus information from the first image pickup unit and the second image pickup unit from a zoom/focus control unit, and obtains camera information from the camera information storage unit according to the zoom/focus information. By using this camera information, the stereo image processing unit performs stereo image processing on a stereo image to perform a three-dimensional measurement on the subject, and outputs, as three-dimensional information, parallax information, distance information, three-dimensional position information, an evaluation value indicating reliability of the three-dimensional measurement, and others (Patent Literature 1).
Patent Literature 2 discloses a structure in which, with one picked-up image being taken as a standard image and the other picked-up image being taken as a reference image, moving the reference image, creating a parallax image, and selecting a movement amount of the reference image with a minimum dispersion in parallax amount as a correction amount of the reference image.
Patent Literature 3 discloses a structure of generating a distance image, creating a histogram indicative of a frequency of appearances of a pixel for each distance in the distance image, and detecting a range of a main subject based on the histogram.
In a stereo camera system, when an object targeted for obtaining position information in a three-dimensional space (a target object) is shot by a plurality of cameras, position information of the target object in a three-dimensional space can be specified from position information of the target object projected onto a light-receiving surface (hereinafter referred to as a screen as appropriate) of a photoelectric conversion element such as, for example, a CCD, in each camera. Therefore, it is required to find in advance a correspondence (a position information correspondence) between position information of an object present at a position in a three-dimensional space and, when the target object is present at that position, position information about a position where that target object is projected onto the screen of each camera. Finding this position information correspondence is called calibration (Patent Literature 4).
A stereo camera unit finding three-dimensional information about a subject by using two cameras requires internal parameters formed of information including a focal distance of each camera, an image center, and a pixel size, external parameters formed of relation information such as positions and postures of two cameras, optical distortion parameters based on a difference between an ideal optical system and an actual optical system in each cameras, and others. These parameters are collectively referred to as camera parameters. In camera calibration, images of a subject whose three-dimensional position is known in advance are captured by the stereo camera unit configured of a plurality of cameras to fine a plurality of projected images and, from the plurality of these projected images and their three-dimensional coordinate positions, camera parameters are found. Then, from these camera parameters, three-dimensional information about a point corresponding to a predetermined image position in the projected image (Patent Literature 5).
Depth image generating means reads a depth image generating method (a mapping table) corresponding to an AF focusing position (a distance from an image pickup lens to a subject currently in focus) and sets a depth value for each pixel based on this mapping table and distance information for each pixel obtained by distance information obtaining means, thereby generating a depth image. With this, a stereoscopic vision image suitable for the subject obtained from the AF focusing position at the time of shooting can be created (Patent Literature 6).
Images of a subject is captured by two or more cameras provided at different positions; a search is made for a corresponding point, which is a corresponding pixel between the plurality of images obtained above (a standard image obtained by a standard camera and a reference image obtained by a reference camera) (stereo matching); a difference (parallax) is calculated between a pixel on the standard image and a pixel on the reference image, these pixels corresponding to each other; and the principle of triangulation is applied to the parallax to measure a distance from the standard camera or the reference camera to a point on the subject corresponding to the pixel. With this, a distance image representing a stereoscopic shape of the subject can be generated. In stereo matching, since there are a plurality of points in a real space mapped on a pixel on the reference image, based on the fact that a pixel on the reference image corresponding to that pixel is present on a straight line representing mapping of points in the real space (an epipolar line), a search is made for a corresponding point, which is a pixel on the reference image corresponding to that pixel. In stereo matching, a correlation window including a pixel for which a corresponding-point search is made for is set on the standard image, a correlation window identical to that set on the standard image is moved on the reference image along the epipolar line, a correlation about each pixel in the correlation window on each image is calculated for each movement position, and a pixel positioned at the center of a correlation window with its correlation on the reference image being equal to a predetermined threshold or more is found as a corresponding point of the pixel (Patent Literature 7).
Even in a camera not having a distance measurement sensor mounted thereon, as long as the camera has a mechanism of motor-driving a focus lens forward and backward to focus the focus lens on a subject, the number of pulses of that motor driving can be counted and, from this count value, distance information can be found. In this case, a relation between the-number-of-pulses count value and the distance may be in a form of a function or table data (Patent Literature 8).
A face detecting unit detects a face from each of images captured by two image pickup units based on image data stored in a frame memory. As this face detecting method, a known method can be used. For example, a pattern imaged for face detection is stored in a ROM and, by referring to this pattern image, a face detection is made by pattern recognition (Patent Literature 9).
Other than the above, there are various face detecting methods. For example, there is a methodology in which a region having a skin color and the shape (for example, an oblong-based figure) of a person is detected in an image and the detected region is extracted as being taken as a region of the face (Patent Literature 10).
Examples of a method of searching for a corresponding point between different viewpoint images include a Sum of Absolute Difference (SAD) method and a Phase-Only Correction (POC) method (Patent Literature 11).
Patent Literature 12 illustrates an example of a distortion correction equation.
Patent Literature 13 illustrates an example of a table defining a position (focal position) of a focus lens according to a subject distance.