This invention relates to an information processing apparatus and method as well as a medium, and more particularly to an information processing apparatus and method as well as a medium wherein matching between images is performed using a template in measuring a distance according to the stereo method.
Similarly to the principle used by a human being to sense a shape of an object or a distance from an object to a body, a stereo method is generally known as a method of measuring a distance to an object. According to the stereo method, a shape of or a distance to an object can be measured in accordance with the principle of triangulation using images observed by a plurality of cameras having different visual points from each other.
FIG. 1 illustrates the principle of the stereo method. Referring to FIG. 1, two cameras of a base camera 1 and a reference camera 2 are disposed at visual points different from each other so that a position of an object point Q to be measured in a three-dimensional space can be determined from the two cameras. In particular, an observation point nb at which the object point Q is observed on an image plane 1A of the base camera 1 and another observation point nr at which the object point Q is observed on an image plane 2A of the reference camera 2 are determined. Then, the position of the object point Q in the three-dimensional space can be determined from the two observation points nb and nr.
As a technique for detecting the observation point nr corresponding to the observation point nb, a method of searching for a corresponding point on an epipolar line has been proposed. For example, the observation point nr of the reference camera 2 which is a corresponding point to the observation point nb which is on the image plane 1A observed by the base camera 1 (in the following description, an image on the image plane 1A observed by the base camera 1 is referred to simply as base image 1a as seen from FIG. 2A) is present on a straight line LP along which a plane (image plane) which is defined by the optical center (optical axis) of the reference camera 2 and the observation point nb of the base camera 1 and the image plane 2A observed by the reference camera 2 (in the following description, an image on the image plane 2A observed by the reference camera 2 is referred to simply as reference image 2a as seen from FIG. 2B) intersect each other. The straight line LP is called epipolar line. Then, if the positional relationship between the base camera 1 and the reference camera 2 is known, then since the same object which is at different projection points from each other can be found, a desired corresponding point can be detected for each observation point of the base camera 1 by searching for the corresponding point on the epipolar line (straight line LP) on the reference image 2a. 
As a technique for searching for a corresponding point, xe2x80x9cpixel-based matchingxe2x80x9d, xe2x80x9cfeature-based matchingxe2x80x9d and xe2x80x9carea-basedxe2x80x9d matching are known. They have the following characteristics.
The pixel-based matching searches for a corresponding point using concentration values of individual pixels. Therefore, it is high in speed of arithmetic operation, but is low in matching accuracy.
The feature-based matching extracts a characteristic such as a concentration edge from an image and searches for a corresponding point using only the characteristic between images. Therefore, information of a distance image obtained is rough.
The area-based matching involves a kind of correlation arithmetic operation. Therefore, a high arithmetic operation cost is required. However, since a corresponding point to an object can be searched out with a high degree of accuracy and distance values of all pixels can be calculated, the area-based matching is generally used frequently.
FIGS. 2A and 2B illustrate the principle of the area-based matching. Referring to FIGS. 2A and 2B, a local window W (area) is set around a noticed point (noticed pixel) 11 set arbitrarily on an image (base image 1a) observed by the base camera 1, and the window W is set as a template 12. In FIG. 2A, the template 12 is formed from 25 pixels arranged in 5 rowsxc3x975 columns.
Then, as seen in FIG. 2B, the template 12 is disposed as a template 12A on an epipolar line 13 of an image (reference image 2a) observed by the reference camera 2, and matching is performed within the set search range and a coincidence degree R(x, y) is arithmetically operated in accordance with the following expression (1):                               R          ⁡                      (                          x              ,              y                        )                          =                              ∑                                          (                                  x                  ,                  y                                )                            ∈              w                                      xe2x80x83                                ⁢                                    (                                                I                  ⁢                                      xe2x80x83                                    ⁢                                      m1                    ⁡                                          (                                              x                        ,                        y                                            )                                                                      -                                  I                  ⁢                                      xe2x80x83                                    ⁢                                      m2                    ⁡                                          (                                                                        x                          +                                                      Δ                            ⁢                                                          xe2x80x83                                                        ⁢                            x                                                                          ,                                                  y                          +                                                      Δ                            ⁢                                                          xe2x80x83                                                        ⁢                            y                                                                                              )                                                                                  )                        2                                              (        1        )            
where Im1(x, y) is a pixel of the base image 1a, Im2(x+xcex94x, y+xcex94y) is a pixel of the reference image 2a, and xcex94x and xcex94y represent an amount of movement of the template 12 on the epipolar line 13. Thereafter, the template 12 is moved along the epipolar line 13 and is disposed as a template 12B. Then, similarly as for the template 12A, a coincidence degree R(x, y) is arithmetically operated in accordance with the expression (1). The template 12 is further moved along the epipolar line 13 and is disposed as a template 12C. Then, similarly as for the templates 12A and 12B, a coincidence degree R(x, y) is arithmetically operated in accordance with the expression (1).
One of the three coincidence degrees R(x, y) determined in accordance with the expression (1) above which exhibits the lowest value exhibits the highest coincidence degree (similarity degree) between the base image a 1a and the reference image 2a. Accordingly, the movement amount xcex94x, xcex94y of the template 12 when the coincidence degree R(x, y) exhibits the lowest value is determined as a parallax of the noticed point 11, and a shape or a depth of the noticed point 11 in the three-dimensional space can be calculated in accordance with the principle of triangulation using the parallax of the noticed point 11.
In this manner, in the area-based matching, three-dimensional shape data corresponding to all pixels can be obtained by repeating the matching (matching) processing for each pixel. It is to be noted that, while the coincidence degree R(x, y) of the three template 12A to template 12C in FIG. 2B are arithmetically operated in accordance with the expression (1) above, actually the template 12 is successively moved by a predetermined value within a preset search range on the epipolar line 13, and the coincidence degree R(x, y) at each of such positions is arithmetically operated.
However, whichever one of the techniques described above is used, it is difficult to accurately determine all corresponding points on an image because some xe2x80x9cambiguityxe2x80x9d is involved in matching between images.
For example, if it is tried to use the area-based matching to perform matching of a texture pattern 22 on a plane 21 disposed obliquely in a three-dimensional space as shown in FIG. 3, then the texture pattern 22 observed by the two cameras of the base camera 1 and the reference camera 2 is such as shown in FIGS. 4B and 4C, respectively. In particular, FIG. 4A shows the plane 21 of FIG. 3 and the texture pattern 22 disposed on the plane 21, and FIG. 4B shows an observed image (base image 1a) when the plane 21 is observed from the base camera 1 while FIG. 4C shows an observed image (reference image 2a) when the plane 21 is observed from the reference camera 2. As can be seen from FIGS. 4A to 4C, although the left and right cameras (base camera 1 and reference camera 2) observe the same object pattern (texture pattern 22), a geometrical distortion appears between the images of the texture pattern 22 and the same object pattern is recognized as different objects. This gives rise to a problem that matching is difficult.
In order to determine a corresponding point with a higher degree of accuracy, such techniques as xe2x80x9clocal supportxe2x80x9d, xe2x80x9cmatching which uses a higher-order characteristicxe2x80x9d and xe2x80x9cmulti base line stereoxe2x80x9d have been proposed. However, they are not sufficiently high in accuracy as yet.
It is an object of the present invention to provide an image processing apparatus and method as well as a medium by which a corresponding point can be found out with a higher degree of accuracy.
In order to attain the object described above, according to the present invention, a template is deformed to produce deformed templates, and a corresponding point is searched for using the deformed templates.
In order to attain the object described above, according to an aspect of the present invention, there is provided an image processing apparatus, comprising first inputting means for inputting at least one of images picked up by a plurality of image pickup apparatus as a base image, second inputting means for inputting the other one or ones of the images picked up by the image pickup apparatus than the base image as a reference image or images, setting means for setting an object pixel and peripheral pixels around the object pixel from among pixels of the base image as a template, production means for producing a plurality of deformed templates from the template set by the setting means, and calculation means for determining a corresponding point or points of the reference image or images using the deformed templates to calculate a corresponding relationship of the reference image or images to the base image.
According to another aspect of the present invention, there is provided an image processing method, comprising a first inputting step of inputting at least one of images picked up by a plurality of image pickup apparatus as a base image, a second inputting step of inputting the other one or ones of the images picked up by the image pickup apparatus than the base image as a reference image or images, a setting step of setting an object pixel and peripheral pixels around the object pixel from among pixels of the base image as a template, a production step of producing a plurality of deformed templates from the template set by the processing in the setting step, and a calculation step of determining a corresponding point or points of the reference image or images using the deformed templates to calculate a corresponding relationship of the reference image or images to the base image.
According to a further aspect of the present invention, there is provided a medium for causing a computer to execute a program which includes a first inputting step of inputting at least one of images picked up by a plurality of image pickup apparatus as a base image, a second inputting step of inputting the other one or ones of the images picked up by the image pickup apparatus than the base image as a reference image or images, a setting step of setting an object pixel and peripheral pixels around the object pixel from among pixels of the base image as a template, a production step of producing a plurality of deformed templates from the template set by the processing in the setting step, and a calculation step of determining a corresponding point or points of the reference image or images using the deformed templates to calculate a corresponding relationship of the reference image or images to the base image.
With the image processing apparatus, the image processing method and the medium, a plurality of deformed templates are produced, and corresponding relationships of reference images to a base image are calculated based on the deformed templates to calculate a distance to an object point. Consequently, mapping between images can be performed with a higher degree of accuracy.
The above and other objects, features and advantages of the present invention will become apparent from the following description and the appended claims, taken in conjunction with the accompanying drawings in which like parts or elements denoted by like reference symbols.