1. Field of the Invention
This invention generally relates to a method of matching stereo images and a method of detecting disparity between these images, which is chiefly used in the industrial field of stereo cameras for detecting positional information in the image pickup space based on stereo images, volume compression of overall stereo images (i.e. three-dimensional video images), display control of these stereo images, and for the optical flow extraction of moving images and so on.
2. Prior Art
Generally known, conventional methods of matching stereo images and of detecting disparity between these images will be hereinafter explained with reference to a so-called stereo image measurement technology where the position or distance information can be obtained in the image-pickup space by performing the matching between two images (stereo images) and detecting a disparity between these images.
FIG. 1 is a view illustrating the principle of a typical stereo image measurement. In FIG. 1, a three-dimensional coordinate, generally defined by variables x, y and z, represents the real space. A two-dimensional coordinate, generally defined by variables X and Y, represents the plane of image (i.e. an image-pickup plane of a camera). There are provided a pair of two-dimensional coordinates for a pair of cameras 23R and 23L. A position on the image plane of right camera 23R can be expressed by variables XR and YR on one two-dimensional coordinate. A position on the image plane of left camera 23L can be expressed by variables XL and YL on the other two-dimensional coordinate.
Axes XL and XR are parallel to the axis x, while axes YL and YR are parallel to the axis y. Axis z is parallel to the optical axes of two cameras 23R and 23L. The origin of the real space coordinate (x, y, z) coincides with a midpoint between the projective centers of right and left cameras 23R and 23L. The distance between the projective centers is generally referred to as a base length denoted by 2a. A distance, denoted by f, is a focal distance between each projective center and its image plane.
It is now assumed that a real-space point p is projected at a point PR(XR,YR) on the right image plane and at the same time a point PL(XL,YL) on the left image plane. According to the stereo image measurement, PR and PL are determined on respective image planes (by performing the matching of stereo images) and then the real-space coordinate (x, y, z) representing the point p is obtained based on the principle of the trigonometrical survey.
YR and YL have identical values in this case, because two optical axes of cameras 23R and 23L exist on the same plane and X axes of cameras 23R and 23L are parallel to axis x. The relationship between the coordinate values XR, YR, XR, YR and the real-space coordinate values x, y, z is expressed in the following equation. ##EQU1## where d represents the disparity (between stereo images). EQU d=XL-XR (Eq. 3)
As "a" is a positive value (a&gt;0) ,the following relation is derived from the above equation 2. EQU XL&gt;XR and YL=YR (Eq. 4)
Understood from the above-given relationship is that a specific point on one image plane has a matching point on the other image plane along the same scanning line serving as an epipolar line within the region define by XL&gt;XR. Accordingly, the matching point corresponding to a specific point on one image plane can be found on the other image plane by checking the similarity of images in each micro area along the line having the possibility of detecting the matching point.
Some of similarity evaluation methods will be explained below. FIG. 2 shows a conventional method of detecting a mutual correlation value between two images, disclosed in "Image Processing Handbook" (Shokodo publishing Co. Ltd.) by Morio ONOUE et al., for example.
First of all, designation is given to a pixel 2403 existing somewhere on the left image 2401. A pixel matching to this pixel 2403 is next found along the plane of right image 2402. In other words, the matching point is determined. More specifically, a square micro area 2404 (hereinafter referred to as a micro area) is set on the right image 2401 so as to have a size corresponding to n.times.m pixels sufficient to involve the designated pixel 2403 at the center thereof. It is now assumed that IL(i,j) represents the brightness of each point (pixel) within the micro area 2404.
On the other hand, a square micro area 2405 on the right image 2402 is designated as a micro area having its center on a pixel satisfying the condition of equation 4. The micro area 2405 has a size corresponding to n.times.m pixels. It is assumed that IR(i,j) represents the brightness of each point (pixel) within the micro area 2405.
Furthermore, it is assumed that .mu.L, .mu.R, .sigma.L2 and .sigma.R2 represent averages and variances of the brightness in the micro areas 2404 and 2405. The mutual correlation value of these micro areas can be given by the following equation. ##EQU2##
The value "c" defined by the equation 5 is calculated along the straight line (epipolar line) having the possibility of detecting a matching point. Then, the point where the value "c" is maximized is identified as the matching point to be detected. According to this method, it becomes possible to determine the matching point as having the size identical with a pixel. If the matching point is once found, the disparity "d" can be immediately obtained using the equation 3 based on the coordinate values representing thus found matching point.
However, this conventional method is disadvantageous in that a great amount of computations will be required for completely obtaining all the matching points of required pixels since even a single search of finding only one matching point of a certain pixel requires the above-described complicated computations to be repetitively performed with respect to the entire region having the possibility of detecting the matching point.
The computations for obtaining the correlation can be speeded up with reducing size of the micro area, although the stability in the matching point detection will be worsened due to increase of image distortion and noises. On the contrary, increasing the size of the micro area will not only increase the computation time but deteriorate the accuracy in the matching point detection because of the change of correlation values being undesirably moderated. Thus, it will be required to adequately set the size of the micro area by considering the characteristics of the image to be handled.
Furthermore, as apparent from the equation 3, the characteristics of the above-described conventional method resides in that the determination of the disparity directly reflects the result of stereo image matching. Hence, any erroneous matching will cause an error in the measurement of disparity "d". In short, an error in the stereo image matching leads to an error in the disparity measurement.
In this manner, the method of determining a matching point with respect to each of pixels is disadvantageous in that the volume of computations becomes huge. To solve this problem, one of proposed technologies is a method of dividing or dissecting the image into several blocks each having a predetermined size and determining the matching region based on the dissected blocks. For example, "Driving Aid System based on Three-dimensional Image Recognition Technology", by Jitsuyoshi et al., in the Pre-publishing 924, pp. 169-172 of Automotive Vehicle Technical Institute Scientific Lecture Meeting, October in 1992, discloses such a method of searching the matching region based on the comparison between the blocks of right and left images.
FIG. 3 is a view illustrating the conventional method of performing the matching of stereo images between square micro areas (blocks). The left image 2501, serving as a reference image, is dissected into a plurality of blocks so that each block (2503) has a size equivalent to n.times.m pixels. To obtain the disparity, each matching region with respect to each block on the left image 2501 is searched along the plane of right image 2502. The following equation is a similarity evaluation used for determining the matching region. EQU C=.SIGMA..vertline.Li-Ri.vertline. (Eq. 6)
where Li represents luminance of i-th pixel in the left block 2503, while Ri represents luminance of i-th pixel in the right block 2504.
This evaluation is not so complicated when it is compared with the calculation of equation 5 which includes the computations of subtracting the average values. However, the hardware scale is still large because of line memories used for the evaluation of two-dimensional similarity. Furthermore, the overall processing time required will be fairly long due to too many accesses to the memories.
Moreover, using the luminance value for the similarity evaluation will increase the hardware cost because the pre-processing is additionally required for adjusting the sensitivity difference between right and left cameras and for performing the shading correction before executing the stereo image matching processing.
A straight line existing in the image-pickup space may be image-formed as straight lines 2603 and 2604 different in their gradients in blocks 2605 and 2606 of left and right images 2601 and 2602, as shown in FIG. 4. In such a case, it may fail to accurately determine the matching regions.
On the contrary, two different lines may be image-formed as identical lines in blocks 2703 and 2704 on left and right images 2701 and 2702 as shown in FIG. 5. Hence, comparing the pixels between two blocks 2703 and 2704 only will cause a problem that he stereo image matching may be erroneously performed and the succeeding measurement of disparity will be failed.
According to the above-described disparity measuring methods, the unit for measuring each disparity is one pixel at minimum because of image data of digital data sampled at a certain frequency. However, it is possible to perform the disparity measurement more accurately.
FIG. 6 is a view illustrating a conventional disparity measuring method capable of detecting a disparity in a sub-pixel level accuracy. FIG. 6 shows a peak position found in the similarity evaluation value C (ordinate) when the equation 6 is calculated along the search region in each block. The sub-pixel level disparity measurement is performed by using similarity evaluations Ci, Ci-1, Ci+1 corresponding to particular disparities di, di-1, di+1 (in the increment of pixel) existing before and after the peak position. More specifically, a first straight line 2801 is obtained as a line crossing both of two points (di-1, Ci-1) and (di, Ci). A second straight line 2802 is obtained as a line crossing a point (di+1, Ci+1) and having a gradient symmetrical with the line 2801 (i.e. identical in absolute value but opposite in sign). Then, a point 2803 is obtained as an intersecting point of two straight lines 2801 and 2802. A disparity ds, corresponding to thus obtained intersecting point 2803, is finally obtained as a sub-pixel level disparity of the concerned block.
As apparent from the foregoing description, the above-described conventional stereo image matching methods and disparity detecting methods are generally suffering from increase of hardware costs and enlargement of processing time due to four rules' arithmetic calculations of equations 5 and 6 required for the similarity evaluation in the stereo image matching.
Furthermore, performing the similarity evaluation based on two-dimensional windows necessarily requires the provision of line memories as hardware which possibly requires frequent accesses to the memories, resulting in further increase of hardware costs and enlargement of processing time.
Still further, utilizing the comparison of luminance difference between right and left images definitely increases the hardware costs for the addition of preprocessing components, used in the sensitivity adjustment and shading correction between right and left cameras which are performed before executing the stereo image matching.
Yet further, using a single block as the unit for determining the disparity identical in size with a two-dimensional window serving as the unit for the matching will cause a problem that any error occurring in the matching phase based on the two-dimensional window will directly give an adverse effect on the disparity detection of the corresponding block. In short, there is no means capable of absorbing or correcting the error occurring in the matching phase.
Moreover, determining each matching region using only the pixels existing in a block (=two-dimensional window) will possibly result in the failure in the detection of a true matching region.