In many practical applications, such as reverse engineering, robotic exploration/navigation in clustered environments, model construction for virtual reality, human body measurements, and advanced product inspection and manipulation by robots, the automatic measurement and reconstruction of 3D shapes with high speed and accuracy is of critical importance. Currently, the devices widely used in industry for obtaining 3D measurements involve the mechanical scanning of a scene, for example in a laser scanning digitizer, which inevitably makes the measurement a slow process. Some advanced active vision systems using structured lighting have been explored and built. However, the existing systems lack the ability to change their settings, to calibrate by themselves and to reconstruct the 3D scene automatically.
To reconstruct a complete and accurate 3D model of an unknown object, two fundamental issues must be addressed. The first issue is how to acquire the 3D data for reconstructing the object surface. Currently, a laser range finder/scanner [1] is widely used for 3D surface data acquisition in industry. However, due to the mechanical scanning involved, the acquisition speed is limited. To increase the efficiency in the 3D imaging, pattern projections can be employed [2]. Portable 3D imaging systems based a similar principle have also been designed recently.
The second issue is how to determine the next viewpoint for each view so that all the information about the object surface can be acquired in an optimal way. This is also known as the NBV (Next Best View) problem, which determines the sensor direction (or pose) in the reconstruction process. The problem of viewpoint planning [3] for digitalization of 3D objects can be treated in different ways depending on whether or not the object's geometry is known beforehand [4,5]. For an unknown object, since the number of viewpoints and their viewing direction are unknown or cannot be determined prior to data acquisition, conventional 3D reconstruction processes typically involve an incremental iterative cycle of viewpoint planning, digitizing, registration and view integration and is conventionally based on a partial model reconstructed thus far. Based on a partial model reconstructed, the NBV algorithm then provides quantitative evaluations on the suitability of the remaining viewpoints. The evaluation for each viewpoint is based on all visible surface elements of the object that can be observed. The viewpoint with the highest visibility (evaluation score) is selected as the NBV.
In general, there are two fundamental problems to be solved when determining the Next Best View. The first problem is to determine the areas of the object which need to be sensed next and the second is to determine how to position the sensor to sample those areas. As there is no prior knowledge about the object, it is impossible to obtain a complete description of an object when occlusion occurs. Therefore, it is not generally possible to obtain precisely the invisible portions from either the current viewpoint or the acquired partial description of the object, so only an estimation of the Next Best View may be derived.
Various Next Best View algorithms have been proposed to date, for example Connolly [6] uses octree to represent object space, and the regions that have been scanned are labeled as seen, regions between the sensor and the surface are labeled as empty and all other regions are labeled as unseen. A set of candidate viewpoints is enumerated at fixed increments around the object. The Next Best View is calculated based on the evaluation of the visibility of each candidate viewpoint. This algorithm is computationally expensive and it does not incorporate the sensor geometry.
Maver and Bajesy [7] presented a solution to the NBV problem for a specific scanning setup consisting of an active optical range scanner and a turntable. In this document, unseen regions of the objects are represented as polygons. Visibility constraints for the sensor to view the unseen region are computed from the polygon boundaries. However, this solution is limited to a particular sensor configuration.
Pito [8] proposes an approach based on an intermediate position space representation of both sensor visibility constraints and unseen portions of the viewing volume. The NBV is determined as the sensor position that maximized the unseen portion of the object volume. This approach has been demonstrated to have achieved automatic viewpoint planning for a range sensor constrained to move on a cylindrical path around the object.
Whaite and Ferrie [9] use the superellipsoid model to represent an object and define a shell of uncertainty. The Next Best View is selected at the sensor position where the uncertainty of the current model fitted to the partial data points is the largest. This algorithm enables uncertainty-driven exploration of an object to build a model. However, the superellipsoid cannot accurately represent objects with a complex surface shape. Furthermore, surface visibility constraints were not incorporated in the viewpoint planning process.
Reed and Allen [10] propose a target-driven viewpoint planning method. The volume model is used to represent the object by extrusion and intersection operations. The constraints, such as sensor imaging constraints, model occlusion constraints and sensor placement constraints, are also represented as solid modeling volumes and are incorporated into the viewpoint planning. The algorithm involves expensive computation on the solid modeling and intersection operation.
Scott [11] considers viewpoint planning as integer programming. However, in this system the object must be scanned before viewpoint planning to obtain prior knowledge about an unknown object. Given a rough model of an unknown object, a sequential set of viewpoints is calculated to cover all surface patches of the object with registration constraint. However, the object must be scanned before viewpoint planning to obtain the prior knowledge about unknown objects.
In many applications, a vision sensor often needs to move from one place to another and change its configuration for perception of different object features. A dynamic reconfigurable vision sensor is useful in such applications to provide an active view of the features.
Active robot vision, in which a vision sensor can move from one place to another for performing a multi-view vision task, is an active research area. A traditional vision sensor with fixed structure is often inadequate for the robot to perceive the object's features in an uncertain environment as the object distance and size are unknown before the robot sees the object. A dynamically reconfigurable sensor may assist the robot in controlling the configuration and gaze at the object surfaces. For example, with a structured light system, the camera needs to see the object surface illuminated by the projector, to perform the 3D measurement and reconstruction task.
The system must be calibrated and traditionally, the calibration task is accomplished statically by manual operations. A calibration target/device is conventionally designed with a precision calibration fixture to provide a number of points whose world coordinates are precisely known [12]-[14]. With a planar calibration pattern, the target needs to be placed at several accurately known positions in front of the vision sensor. For dynamically reconfigurable vision systems, the vision system needs to have the ability of self-recalibration without requiring external 3D data provided by a precision calibration device.
Self-calibration of vision sensors has been actively researched in the last decade. However, most of the conventionally available methods were developed for calibration of passive vision systems such as stereo vision and depth-from-motion [15]-[22]. Conventionally these systems require dedicated devices for calibrating the intrinsic and extrinsic parameters of the cameras. Due to the special calibration target needed, such a calibration is normally carried out off-line before a task begins. In many practical applications, on-line calibration during the execution of a task is needed. Over the years, efforts have been made in research to achieve efficient on-line calibrations.
Maybank and Faugeras [23] suggested the calibration of a camera using image correspondences in a sequence of images from a moving camera. The kinds of constructions that could be achieved from a binocular stereo rig were further addressed in [24]. It was found that a unique projective representation of the scene up to an arbitrary projective transformation could be constructed if five arbitrary correspondences were chosen and an affine representation of the scene up to an arbitrary affine transformation could be constructed if four arbitrary correspondences were adopted.
Hartly [25] gave a practical algorithm for Euclidean reconstruction from several views with the same camera based on Levenberg-Marquardt Minimization. A new approach based on stratification was introduced in [26].
In this context, much work has been conducted in Euclidean reconstruction up to a transformation. Pollefeys et al [27] proposed a method to obtain a Euclidean reconstruction from images taken with an uncalibrated camera with variable focal lengths. This method is based on an assumption that although the focal length is varied, the principal point of the camera remains unchanged. This assumption limits the range of applications of this method. A similar assumption was also made in the investigations in [28,29]. In practice, when the focal length is changed (e.g. by zooming), the principal point may vary as well. In the work by Heyden and Astrom [30], they proved that it is possible to obtain Euclidean reconstruction up to a scale using an uncalibrated camera with known aspect ratio and skew parameters of the camera. A special case of a camera with Euclidean image plane was used for their study. A crucial step in the algorithm is the initialization which will affect the convergence. How to obtain a suitable initialization was still an issue to solve [31]. Kahl [32] presented an approach to self-calibration and Euclidean reconstruction of a scene, assuming an affine model with zero skew for the camera. Other parameters such as the intrinsic parameters could be unknown or varied. The reconstruction which needed a minimum of three images was an approximation and was up to a scale. Pollefeys et al gave the minimum number of images needed for achieving metric reconstruction, i.e. to restrict the projective ambiguity to a metric one according to the set of constraints available from each view [31].
The above-mentioned reconstruction methods are based on passive vision systems. As a result, they suffer from the ambiguity of correspondences between the camera images, which is a difficult problem to solve especially when free-form surfaces [33] are involved in the scene. However, to avoid this problem, active vision may be adopted. Structured light or pattern projection systems have been used for this purpose. To reconstruct precisely a 3D shape with such a system, the active vision system consisting of a projector and a camera needs to be carefully calibrated [34, 35]. The traditional calibration procedure normally involves two separate stages: camera calibration and projector calibration. These individual calibrations are carried out off-line and they have to be repeated each time the setting is changed. As a result, the applications of active vision systems are limited, since the system configuration and parameters must be kept unchanged during the entire measurement process.
For active vision systems using structured-light, the existing calibration methods are mostly based on static and manual operations. The available camera self-calibration methods cannot be applied directly to structured-light systems as they need more than two views for the calibration. Recently, there has been some work on self-calibration [36]-[40] of structured-light systems. Fofi et al. [36] investigated the self-calibration of structured-light systems, but the work was based on the assumption that a square projected onto a planar surface will most generally give a quadrilateral shape in the form of a parallelogram”.
Jokinen [37] studied a self-calibration method based on multiple views, where the object is moved by steps. Several maps were acquired for the registration and calibration. The limitation of this method is that the object must be placed on a special device so that it can be precisely moved.
Using a cube frame, Chu et al. [38] proposed a calibration free approach for recovering unified world coordinates.
Chen and Li [39, 40] recently proposed a self-recalibration method for a structured-light system allowing changes in the system configuration in two degrees of freedom.
In some applications, such as seabed metric reconstruction with an underwater robot, when the size or distance of the scene changes, the configuration and parameters of the vision system need to be changed to optimize the measurement. In such applications, uncalibrated reconstruction is needed. In this regard, efforts have been made in recent research. Fofi et al [41] studied the Euclidean reconstruction by means of an uncalibrated structured light system with a colour-coded grid pattern. They modeled the pattern projector as a pseudo camera and then the whole system as a two-camera system. Uncalibrated Euclidean reconstruction was performed with varying focus, zoom and aperture of the camera. The parameters of the structured light sensor were computed according to the stratified algorithm [26], [42]. However, it was not clear how many of the parameters of the camera and projector could be self-determined in the uncalibrated reconstruction process.
Thus, there is a need for a reconfigurable vision system and method for 3D measurement and reconstruction in which recalibration may be conducted without having to use special calibration apparatus as required by traditional calibration methods.