1. Field of the Invention
The invention relates to a method and a device for scanning and digitizing three-dimensional surfaces. The invention is applicable, in particular, to any implementation in which a surface in three-dimensional space may be digitally acquired and processed.
2. Description of the Related Art
Most existing optical 3D sensors require the acquisition of multiple 2D camera images in order to obtain 3D data. The most common technique is the so-called “fringe projection” technique [ref M. Halioua, H. Liu, V. Srinivasan, “Automated phase-measuring profilometry of 3-D diffuse objects,” in Appl. Opt. 23 (1984) 3105-3108], which is widely commercially available, for example the Face Scan sensor by 3D-Shape GmbH, Erlangen, Germany. A projector projects a fringe pattern onto the object. One or more cameras observe the object surface. In general, at least three fringe patterns have to be projected in a sequence resulting in at least three 2D-raw images. For better accuracy, most fringe projection sensors take even more raw images. During the time it takes the series of raw images, the object and the sensor have to stand still, which makes the sensor not well adapted, when relative motion between object and sensor is involved.
In many applications, the object has a complicated shape, so the acquisition of the 3D topography cannot be achieved from a single observation direction. The sensor has to take data from different directions which then are registered. This procedure needs a stop and go movement of the sensor, which makes the measurement quite uncomfortable, even more so because only after the time consuming registration of the different views the user will know if there are parts of the object missing. Nevertheless, the fringe projection principle is widely used, as it supplies an acquisition of up to 1 Mio high quality data points within each viewing direction.
Using an additional modality such as color, it is principally possible to make a sensor that needs only one single raw (color) image to acquire a complete 3D topography [ref G. Hausler and D. Ritter, “Parallel 3D-sensing by color-coded triangulation,” in Appl. Opt. 32, No 35 (1993) 7164-7169]. The achievable quality of the data and the technical costs however, make the sensor not yet competitive.
There exist other options to achieve a “single shot 3D sensor.” However, those sensors principally cannot deliver a complete set of 3D data. The simplest single-shot sensor is based on light sectioning triangulation [G. Häusler and W. Heckel, “Light sectioning with large depth and high resolution,” in Appl. Opt. 27 (1988) 5165-5169]. Instead of projecting a full field fringe pattern, only one single line (or a couple of lines) is projected onto the object surface. So from one single raw image one can acquire one 3D line profile, or if several lines are projected, one can acquire several 3D line profiles. Between the line profiles (“3D sections”), no data are available. We call such 3D data “sparse.”
To summarize, we have the motion sensitive fringe projection systems that acquire complete 3D data, and the motion robust light sectioning sensors that deliver just sparse 3D data. Our goal is a new sensor that will use the single shot principle but will nevertheless deliver complete and high quality 3D data of the object surface.
To a certain extent, there are existing solutions, for example the T-Scan 3 sensor from Steinbichler Optotechnik GmbH, 83115 Neubeuern, Germany. That sensor can be hand guided over the object surface to generate a more or less complete 3D surface reconstruction. However, the sensor needs an additional tracking system, realized by a photogrammetric camera system. The sensor uses only one-line laser triangulation, which makes it difficult to get complete and very accurate data. The necessity to track the sensor makes a completely free motion difficult, because the tracking field of view must not be obscured by the person who moves the sensor.
The concept of acquiring a surface by moving the sensor and subsequently register 3D data is realized as well by the so called “3D from motion” principle, described, for example by C. Tomasi and T. Kanade: “Shape and Motion from Image Streams under Orthography: a Factorization Method,” in International Journal on Computer Vision, 9(2), 137-154, 1992. A camera is moved and takes different 2D raw images, and from the extracted corresponding points in different views, a 3D reconstruction can be achieved. Shape from motion commonly is a passive method, with no projected markers, so it is difficult to obtain a complete surface reconstruction.
There are increasing demands to use the technology of 3D acquisition, for example, in the field of intraoral sensors. Most existing intraoral sensors require the acquisition of multiple 2D camera images in order to obtain 3D data. A most prominent sensor is the “Cerec” sensor by Sirona. It is based on the principle of “fringe projection.” After an acquisition of at least three 2D images a 3D view can be obtained. Within the acquisition period (longer than 100 ms), the sensor and the object have to stand still. The measurement principle of the sensor, which requires several camera images in order to generate 3D data, is cumbersome and spurious, because relative motion between sensor and object under test during acquisition is not permitted.
Another state-of-the-art sensor is “directScan” by Hint-Els. It combines fringe projection with phase correlation. In a first step, two series of orthogonal stripe patterns, each series consisting of at least three images, are projected, one after the other, onto the surface to be acquired. From each of the two series of captured camera images, a phase evaluation is performed. In a second step, all resulting pairs of phase values are correlated in order to determine even a more robust and more precise single phase value for each camera pixel. From this information, a set of 3D points is calculated. Hence, it requires an acquisition of multiple 2D images in order to generate a 3D view. Within the acquisition time window (about 200 ms), the sensor and the object are not allowed to move, making a motion-robust measurement impossible.
A variety of other sensors are in use. One such sensor is “iTero” by Cadent which is based on “parallel confocal imaging.” 100,000 points of laser lightning at 300 focal depths are employed, yielding a lateral resolution of 50 μm. During the acquisition at these 300 focal depths (the scanning through different z-positions is necessary in order to generate one 3D view, taking about 300 ms), the sensor does not allow motion. The necessity of an acquisition of multiple images, again, renders the sensor cumbersome in its use. It is especially disadvantageous that the sensor must be moved to pre-determined positions, thus rendering free-hand guidance during the acquisition impossible.
The prior art system “Lava” by 3M Espe employs the so-called “active wavefront sampling” principle. An off-axis rotating aperture generates a circular pattern, rotating at the object surface. From the diameter of this rotation the defocusing and the distance of the considered area can be determined.
One prior art sensor enables a motion-robust measurement of objects. It is the “SureSmile” sensor by OraMetrix. The OraMetrix system projects one type of pattern. It is based on active triangulation and on a single-shot principle: One 2D camera image already delivers 3D data (roughly 60×60 3D points per 3D view). It acquires about 6 images/second. The application is not the complete acquisition of a surface in space and the system cannot provide the best possible measuring uncertainty.