Three-dimensional scanning and digitization of the surface geometry of objects is commonly used in many industries and services, and their applications are numerous. A few examples of such applications are inspection and measurement of shape conformity in industrial production systems, reverse engineering of existing parts with complex geometry, biometry, etc.
The shape of an object is scanned and digitized using a ranging sensor that measures the distance between the sensor and a set of points on the surface. Different principles have been developed for range sensors. Among them, triangulation-based range sensors are generally adequate for close range measurements, such as distances inferior to a few meters. Using this type of apparatus, at least two rays that converge to the same feature point on the object are obtained from two different viewpoints separated by a baseline distance. From the baseline and two ray directions, the relative position of the observed point can be recovered. The intersection of both rays is determined using the knowledge of one side length and two angles in the triangle, which actually is the principle of triangulation in stereovision. The challenge in stereovision is to efficiently identify which pixels correspond to each other in each image of the stereo pair composing a frame. This problem is especially important for portable or hand-held scanners where, in the most general case, it is imposed that all necessary information for matching is to be found within a single frame.
To simplify the matching problem, one can replace one of the light detectors with a light projector that outputs a set of rays in known directions. In this case, it is possible to exploit the orientation of the projected rays and each detected ray reflected on the object surface to find the matching point. It is then possible to calculate the coordinates of each observed feature point relative to the basis of the triangle.
Although specialized light detectors can be used, digital CCD or CMOS cameras are typically used.
For the projector, the light source can be a coherent source (laser) or non-coherent source (e.g., white light) projecting a spot, a light plane or many other possible patterns. Although the use of a light projector facilitates the detection of reflected points everywhere on the object surface, the more complex the pattern will be, the greater the challenge will be to efficiently identify corresponding pixels and rays.
For this reason, one will further exploit properties from the theory of projective geometry. It has been well known in the field for at least 30 years in the case of two views that one may exploit epipolar constraints to limit the search of corresponding pixels to a single straight line, as opposed to the search in the entire image. This principle is widely exploited both in passive and active (with a projector) stereovision. One example of this usage is described in U.S. Pat. No. 8,032,327 wherein a laser projector projects two perpendicular light planes as a crosshair pattern whose reflection on the surface is captured by two cameras. Projecting thin monochromatic stripes is advantageous for obtaining good signal-to-noise ratio and simplifying image processing to obtain 3D points from each single frame. Having a single stripe observable by each camera insures that each epipolar line intersects the stripe once thus avoiding matching ambiguities.
To reduce the time that is necessary to capture the shape of the surface of an object, one will need either to increase the frame rate or increase the number of stripes that are projected simultaneously, or both. One approach that was proposed consists in projecting a grid of stripes. Projecting a grid is further interesting for surface reconstruction since the projected pattern produces a network of curves on the object surface where tangent curves from two directions make it possible to measure the surface normal. Surface normal information can be advantageously exploited in real-time surface reconstruction from 3D measurements as described in U.S. Pat. No. 7,487,063. Increasing the number of stripes is advantageous for scanning speed but as the number of stripes is increased, the complexity of matching image points before applying triangulation grows exponentially and introduces ambiguities that, in some cases, cannot be resolved.
One way to solve ambiguities consists in adding one or more cameras but the hardware complexity increases and that will reduce the frame rate limit for a given bandwidth. Methods exploiting one or two cameras have been proposed to match points from a projected grid. The intersection of the reflected curves makes it possible to segment and identify connected networks of curve sections to set additional matching constraints. However, points that are extracted near the intersection of two curves are less precise. “Near” means within a distance where image processing operators on pixels from the two curves, interfere. To maintain precision, one will need to discard and thus lose these points.
It would be helpful to alternately produce two sets of non crossing curves to benefit from the surface normal orientation extracted from the surface tangents while avoiding the projection of a grid in a single frame. However, the matching challenge would remain. One solution would consist in projecting multicolored stripes. However, the color reflectivity on some materials would harm the quality of matching and the projector would need to be more complex. Another approach imposes to position an object on a planar background that must be visible in each frame. This clearly limits the flexibility of the system, especially when it is required to measure objects on site without interfering with the environment.
A need remains for a solution which solves the matching problem independently for each single frame, with only two cameras, a projected pattern which may change and no particular constraint on the observed scene.