The present invention relates to measurements of the position, shape and movement of a three-dimensional moving object and, more particularly, to a three-dimensional information reconstruction or recovery method and apparatus which can be used in the fields of three-dimensional information reconstruction, recognition and description (CG) of moving objects.
Conventional three-dimensional information reconstruction techniques can be classified into three categories. A first technique is stereopsis. This technique is one that establishes correspondence of points or lines between two right and left images taken by two cameras and estimates, from the positions of the cameras and pairs of corresponding points or lines on the right and left images, the positions of points and lines on a scene space corresponding to the pairs of corresponding points or lines. A second technique is a three-dimensional information recovery method using a moving camera. This is a method which tracks individual feature points on a number of images picked up by a moving camera and estimates the positions of points on a scene space corresponding to the feature points. A third technique is a backprojection method, which recovers or reconstructs a three-dimensional structure of an object in a scene space by projecting back feature points in images to the scene space.
With the first technique (see, for example, Kanade T., Okutomi T. and Nakahara M., "A multiple baseline stereo method," Proc. Image understanding Workshop, pp. 409-426, 1992 or U.S. Pat. No. 4,654,872), many points on the surface of the object tend to be occluded from the sight line of either one of the cameras because of uneven object surface, and hence accurate positions of the corresponding feature points between the right and left images cannot be obtained, making it hard to obtain highly accurate three-dimensional information. The second technique (see, for example, Bolles R. C., Baker H. H. and Marimont D. H.: "Epipolar-plane image analysis: an approach to determining structure from motion," IJCV, Vol. 1, No. 1, pp. 7-55, 1987) cannot be applied to a moving object, because the object needs to stand still during its image capturing session by a moving camera. Recently, there has been proposed a technique which permits simultaneous extraction of the three-dimensional shape and motion of an object from many images taken by a single camera (see Tomasi C. and Tanade T.: "Shape and motion from image streams under orthography: a factorization method," IJCV, Vol. 9, No. 2, pp. 137-154, 1992). This technique recovers three-dimensional information basically by tracking feature points between the images; therefore, this method cannot obtain accurate three-dimensional information because the surface of the object partly occluded from the sight line of the camera while the camera or the object is moving. The reason for this is that since a noted feature point on the images repeatedly gets out of and into the view field, the locus of the feature points on the images frequently breaks, introducing difficulty in tracking the feature point. Hence, this technique is not suitable for use with a moving object. A third technique is a silhouette projection method (see, for example, Ahuja N. and Veenstra J.: "Generation octree from object silhouettes in orthographic views," IEEE Trans. PAMI, Vol. 11, No. 2, pp. 137-149, 1989). With this method, however, it is very difficult to acquire accurate three-dimensional information, because the generation of silhouette images is extremely difficult and unstable. Another example of the third technique is a method which recover or reconstruct edges of a three-dimensional object by extracting edges on images and giving a voting to the scene space through use of the extracted edges (see, for example, Hamano T., Yasuno T. and Ishii K.: "Direct estimation of structure from non-linear motion by voting algorithm without tracking and matching," Proc. of ICPR, Vol. 1, pp. 505-508, 1982 and S. Kawato: "3D Shape Recovering by Octree Voting Technique," PROCEEDINGS of SPLE-The International Society for Optical Engineering, 15-16 November 1992). With such a method, however, since a plurality of feature points are simultaneously extracted, processes for the respective feature points interfere with each other, incurring possibility of a false feature point being extracted. A large number of images are needed to solve this problem. For a moving object, however, much time is consumed to take many images by one camera and a simultaneous image pickup system using many cameras is very expensive.