The present invention relates to a method and apparatus for inputting three-dimensional shape information for extracting three-dimensional shape information of an actual object, i.e., depth information or distance information, and an image input apparatus. More particularly, the present invention relates to simplification of designating corresponding points in a case where the distance or the depth is measured in a range finder or the like.
Conventionally, for instance, in the field of construction or design, it is often necessary to input three-dimensional shape information of an actual object to a computer. By virtue of the recent improvement in drawing capability using three-dimensional computer graphics (CG), a user can be provided with three-dimensional shape information of, for instance, merchandize or a building object. In this case also, three-dimensional shape information of the actual merchandize or building object must be inputted to a computer.
In view of the above background, recently, the technology of inputting three-dimensional shape information of an object has become popular and is becoming increasingly important.
As the typical conventional method of inputting three-dimensional shape information, a method utilizing a contact-type position sensor is known. According to this method, a probe is brought to contact with each point on the surface of an object subjected to measurement, three-dimensional coordinates of the position of the probe are detected, and the detected three-dimensional coordinates of the position of the probe are inputted as three-dimensional position information of each point.
However, in the method of utilizing the contact-type position sensor, the probe needs to be brought to contact with each point of an object. Therefore, the object which can be measured is limited to an object having a size measurable on a table, an object having certain strength, and a stationary object.
For more flexible conventional method of measuring a shape of an object, which is not restrained by the above limitations, a method utilizing a stereo image is known.
According to this method, an image of an object 1303 is picked up at two viewpoints (or the object is picked up twice) by using a stereo-image pickup device 1300 comprising two digital cameras 1301 and 1302 as shown in FIG. 1. The obtained left and right images 1401 and 1402 shown in FIG. 2 are used as an information source.
The obtained two images 1401 and 1402 have a disparity. With respect to a point of interest in the image, two corresponding points in each of the left and right images 1401 and 1402 are designated. Three-dimensional coordinates of the point of interest are obtained by the trigonometry theory using two-dimensional coordinates of the two corresponding points. In this manner, a number of representative points are designated on the object as points of interest, and three-dimensional coordinates of the representative points are obtained as three-dimensional shape information of the object, i.e., distance information or depth information.
Normally, a polygon is constructed by using these representative points as vertices, and an object shape having surfaces is defined. For a method of generating a polygon using arbitrary vertices, Doronet method is well known.
However, the above-described conventional examples have the following problems.
More specifically, in the method of inputting three-dimensional shape information by the conventional stereo-image pickup device 1300 (or using images obtained by performing pickup operation twice), a large number of points of interest must be designated in the left and right images 1401 and 1402. In addition, designating a point of interest requires manual operation for designating corresponding points of the point of interest on the left and right images 1401 and 1402 by an operator using a mouse or the like.
In the manual designation operation, as shown in FIGS. 2A and 2B, an operator first looks at the left image 1401 (or right image 1402) and designates an arbitrary representative point 1403, then looks at the right image 1402 (or left image 1401) and designates a corresponding point 1404 which corresponds to the designated representative point 1403 with a mouse.
However, this operation must be performed for a large number of representative points, causing great physical and mental pain to the operator.
To reduce such operation of designating corresponding points, a known method is to automatically obtain corresponding points by computing correlation between the left and right images 1401 and 1402. By this method, correlation levels are defined with respect to the two corresponding points of the left and right images 1401 and 1402. For a given point on the left image 1401 (or right image 1402), a point on the right image 1402 (or left image 1401) having the largest correlation level is designated as the corresponding point.
To calculate the correlation level, rectangular areas, each having the same size, are defined. Each of the rectangular areas surrounds the corresponding two points in the left and right images 1401 and 1402. In the rectangular areas, two-dimensional correlation is obtained between the left pixel value data L(x, y) and the right pixel value data R(x, y).
However, the automatic detection of corresponding points achieved by the correlation calculation also has the following problems.
I: To reduce calculation time, the area subjected to correlation calculation must be narrowed down. To narrow down the calculation area, operator""s auxiliary input is necessary, i.e., operator must designate rough corresponding points. Even if the area is narrowed down, calculation of corresponding points is time consuming due to the processing capability of a computer. During the calculation, the operator""s operation must be suspended. Furthermore, if the area subjected to correlation calculation is not narrowed down, unrealistic calculation time is required for some image sizes.
II: For at least one of the left or right images 1401 or 1402, representative points must be selected and the positions of the representative points must be manually inputted with a mouse or the like.
III: Due to limitations in precision of correlation calculation, wrong corresponding points may sometimes be given. Therefore, corresponding points calculated by the computer must be always confirmed by an operator, and if it is wrong, corresponding points must be manually designated by the operator.
As described above, according to the method of inputting three-dimensional shape information using two images 1401 and 1402 picked up by the stereo-image pickup device 1300 (or using images obtained by performing pickup operation twice), time consuming operation is required for designating corresponding points. Although cumbersome operation posed to the operator is somewhat reduced because of the introduction of automatic corresponding point detection utilizing correlation coefficients, input operation with a mouse is still necessary, thus operational burden is still large.
The present invention is made in consideration of the above situation, and has as its object to provide a method of efficiently identifying three-dimensional coordinates of corresponding points on a plurality of images, for a point of interest of an object.
According to the present invention, the foregoing object is attained by providing a method of identifying three-dimensional coordinates of points of interest on a plurality of images obtained from an object, comprising the steps of: presenting to a viewer a first image and a second image as a stereo image, which are arbitrarily selected from among the plurality of images; detecting left and right lines of sight of the viewer while the first and second images are presented to the viewer; and calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the detected left and right line-of-sight data.
Furthermore, in order to attain the above object, the present invention provides an apparatus for identifying three-dimensional coordinates of points of interest on a plurality of images obtained from an object, comprising: stereo-image presenting means for presenting to a viewer a first image and a second image as a stereo image, which are arbitrarily selected from among the plurality of images; detecting means for detecting left and right lines of sight of the viewer viewing the stereo image presented by the stereo-image presenting means; and calculating means for calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the detected left and right line-of-sight data.
The method and apparatus having the above configuration utilize the theory in which the points of interest a viewer (user) gazes at on a plurality of displayed images (stereo image) should be the corresponding points on an object. Therefore, by detecting the left and right lines of sight of the viewer, it is possible to identify three-dimensional coordinates of the point of interest, i.e., three-dimensional coordinates of the point on the object.
In order for a viewer to efficiently gaze at the point of interest, it is preferable that the presented image gives the viewer stereoscopic feeling. Therefore, according to an aspect of the present invention, the first and second images presented in the presenting step are images of an object picked up in advance or about to be picked up, by stereo cameras spaced from each other by a base length.
According to an aspect of the present invention, the viewer gazes at a vertex on the stereo image as a point of interest.
According to another aspect of the present invention, obtained in the calculating step are: two-dimensional coordinates of the point of interest in first and second image coordinate systems provided respectively for the first and second images, obtained based on a distance L from an eyeball of the viewer to a display surface where the first and second images are displayed, and the left and right lines of sight; and three-dimensional coordinates of the point of interest, obtained based on obtained coordinates positions of the point of interest in either one of the first and second image coordinate systems, the base length of the viewer, and a difference of the coordinate positions of the point of interest in the first and second image coordinate systems.
According to an aspect of the present invention, a line of sight of the viewer is detected by detecting a rotation of the eyeball with respect to two axes of the eyeball in the detecting step.
There are many points of interest on an image. In order to assure capturing these points of interest, it is necessary to set the timing for detecting the line of sight. For this, the present invention further comprises a step of initiating the detecting step.
According to an aspect of the present invention, the initiating step starts the detecting step on a manual input instruction by the viewer, e.g., operating a keyboard or a mouse.
According to an aspect of the present invention, the timing at which the detecting step should be started is determined in the initiating step based on variations in the line of sight of the viewer.
According to an aspect of the present invention, the timing at which the detecting step should be started is determined in the initiating step by detecting a state where the variations in the line of sight of the viewer are smaller than a predetermined threshold. When the variations in the line of sight of the viewer are small, the view point is recognized as a point of interest by the viewer.
In order to improve precision in determination of a point of interest, according to an aspect of the present invention, the initiating step comprises: a second detecting step of detecting line-of-sight data of the viewer in a sequential order; a step of storing in a predetermined memory, only the line-of-sight data having a smaller variation in the line of sight than a predetermined threshold; and a step of deciding timing to start the detecting step when the stored line-of-sight data reaches a predetermined sample number.
Further, according to an aspect of the present invention, an average value of the predetermined sample number of line-of-sight data is calculated in the calculating step, in response to the timing deciding step; and three-dimensional coordinates of the point of interest are calculated based on the calculated average value of line-of-sight data.
The period of time a viewer gazes at a point of interest varies depending on individuals. For a viewer who gazes at the point of interest for a long time, the apparatus may detect the same point as a sample for a number of times. Therefore, according to an aspect of the present invention, the calculating step further comprises the steps of: sequentially storing in a predetermined memory, line-of-sight data detected at the timing decided in the timing deciding step; and in a case where variations in a number of line-of-sight data are larger than a predetermined threshold value among a plurality of line-of-sight data stored in the predetermined memory, deleting the number of line-of-sight data except one data.
According to an aspect of the present invention, in the detecting step, a rotation amount of the eyeball in the vertical direction and a rotation amount of the eyeball in the horizontal direction are detected as a line of sight.
Another object of the present invention is to provide an apparatus for efficiently identifying three-dimensional coordinates of corresponding points on a plurality of images, for a point of interest of an object. In order to attain the object, the present invention provides an apparatus for identifying three-dimensional coordinates of points of interest on a plurality of images obtained from an object, comprising: stereo-image presenting means for presenting to a viewer a first image and a second image as a stereo image, which are arbitrarily selected from among the plurality of images; detecting means for detecting left and right lines of sight of the viewer viewing the stereo image presented by the stereo-image presenting means; and calculating means for calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the detected left and right line-of-sight data.
According to an aspect of the present invention, the detecting means further comprises: irradiation means having a light source which irradiates an invisible ray for irradiating each eyeball; an optical system for focusing the invisible ray reflected by each eyeball; image pickup means for picking up an image formed by the optical system; and means for obtaining a center position of the pupil in the eye and a position of the light source in a virtual image formed by cornea reflection, based on the picked-up image of the eyeball, and obtaining a rotation angle of the eyeball based on a relative relation between the center position and the position of virtual image.
According to an aspect of the present invention, the detecting means detects a state in which variations in line-of-sight angles of the eyeball of the viewer remain smaller than a predetermined threshold value for a predetermined period, determines a point of interest of the viewer during the predetermined period based on an arbitrary line-of-sight angle value or a line-of-sight angle average value, and selects the point of interest as a point for defining a shape of the object.
In order to appropriately present a stereo image to a viewer, it is preferable that the display surface and the viewpoint position of the viewer be known. For this purpose, according to an aspect of the present invention, the stereo-image presenting means comprises a head-mount display device which keeps a fixed relative positional relation between a viewer""s head and a display surface.
In a case where the device for detecting the angle of eyeball is worn by a viewer, unknown errors are often generated. Therefore, according to an aspect of the present invention, the stereo-image presenting means comprises: a stereo-image display fixed on a table; and means for correcting the stereo-image presenting means by detecting a relative positional deviation of the viewer""s head with respect to the display.
Another object of the present invention is to provide a method of efficiently inputting data indicative of three-dimensional shape. In order to attain the object, the present invention provides a three-dimensional shape information input method of inputting three-dimensional coordinates of points of interest on a plurality of images obtained from an object as three-dimensional shape information, comprising the steps of: presenting to a viewer a first image and a second image as a stereo image, which are arbitrarily selected from among the plurality of images; detecting at a predetermined timing, left and right lines of sight of the viewer while the first and second images are presented to the viewer; calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the left and right line-of-sight data detected in the detecting step; and repeating the detecting step and calculating step with respect to other points of interest, and inputting a group of three-dimensional coordinates of points of interest obtained respectively, in a memory as three-dimensional shape information of the object.
Another object of the present invention is to provide an apparatus for efficiently inputting data indicative of three-dimensional shape. In order to attain the object, the present invention provides a three-dimensional shape information input apparatus for inputting three-dimensional coordinates of points of interest on a plurality of images obtained from an object as three-dimensional shape information, comprising; presenting means for presenting to a viewer a first image and a second image as a stereo image, which are arbitrarily selected from among the plurality of images; detecting means for detecting at a predetermined timing, left and right lines of sight of the viewer while the first and second images are presented to the viewer; calculating means for calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the left and right line-of-sight data detected by the detecting means; and input means for inputting a group of three-dimensional coordinates of points of interest, obtained by the detecting means and the calculating means with respect to a number of points of interest, in a memory as three-dimensional shape information of the object.
Another object of the present invention is to provide a method of inputting three-dimensional viewer""s indication utilizing line-of-sight data. In order to attain the object, the present invention provides a three-dimensional line-of-sight indicating method of inputting viewer""s indication based on points of interest viewed by a viewer on a first image and a second image obtained from an object, comprising the steps of: presenting to the viewer the first and second images as a stereo image; detecting left and right lines of sight of the viewer while the first and second images are presented to the viewer; calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the detected left and right line-of-sight data; and outputting the calculated three-dimensional coordinates of the point of interest as viewer""s indication data.
Another object of the present invention is to provide an apparatus for inputting three-dimensional indication of a viewer by using line-of-sight data. In order to attain the object, the present invention provides a three-dimensional line-of-sight indicating apparatus for inputting viewer""s indication based on points of interest viewed by a viewer on a first image and a second image obtained from an object, comprising: presenting means for presenting to the viewer the first and second images as a stereo image; detecting means for detecting left and right lines of sight of the viewer while the first and second images are presented to the viewer; calculating means for calculating three-dimensional coordinates of a point of interest at which the viewer is gazing, based on the detected left and right line-of-sight data; and outputting means for outputting the calculated three-dimensional coordinates of the point of interest as viewer""s indication data.