The invention relates generally to the field of three-dimensional imaging and positioning.
Augmented reality (AR) superimposes computer-generated three-dimensional (3D) graphics or two-dimensional (2D) information on a user's view of a surrounding environment in real-time, enhancing the user's perception of the real world. The goal of an AR application could be to complement existing information in the scene, such as overlaying augmented text on an historic building (D. Scagliarini et al., “Exciting understanding in Pompeii through on-site parallel interaction with dual time virtual models,” Proc. Eurographics-Siggraph Virtual Reality, Archaeology, and Cultural Heritage Ann. Conf. (VAST01), in press; 2001) or giving more educational information in a museum (F. Mata, C. Claramunt, A. Juarez, “An experimental virtual museum based on augmented reality and navigation,” Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 2011; R. Wojciechowski, K. Walczak, M. White, W. Cellary, “Building virtual and augmented reality museum exhibitions”, Proceedings of the ninth international conference on 3D Web technology, April 2004; E. Woods et al., “Augmenting the science centre and museum experience”, Proceedings of the 2nd international conference on Computer graphics and interactive techniques in Australasia and South East Asia, June 2004) or a book (S. Lee, J. Choi, and J. Park, “Interactive e-learning system using pattern recognition and augmented reality,” IEEE Transactions on Consumer Electronics, vol. 55, no. 2, May 2009). Another purpose could be to simply draw attention to the virtual data such as an entertainment game.
Regardless of the field for which the application is applied, or its primary purpose in the scene, many AR pipelines share two primary goals, the first being range-finding the environment (e.g., knowing a depth, precise 3D coordinates, or a camera pose estimation), and the second being registration and tracking of the 3D environment, such that relative movement of the environment with respect to the camera can be followed. Both range-finding and tracking can be done using a black and white fiducial marker or some known parameters about the scene in order to triangulate corresponding points. The former method is referred to as a marker-based AR system while the latter is known as a marker-less system.
Marker based systems pose sometimes unwanted objects on the outputting video frame which can reduce the visual aesthetic or get in the way of a desired view, particularly in mobile devices where resolution and/or field of view is limited. It is often preferable to employ a marker-less system so as not to disturb the user's experience in a mobile scenario.
Existing marker-less AR applications extract pre-stored spatial information from objects followed by mapping invariant feature points to calculate the pose (V. Teichrieb et al., “A survey of online monocular markerless augmented reality”, International Journal of Modeling and Simulation or the Petroleum Industry, vol. 1, no. 1, p. 1-7, August 2007; Joao Paulo Lima et al., “Model based 3D tracking techniques for markerless augmented reality,” SVR, SBC, Porto Alegre, pp. 37-47, 2009; Andrew I. Comport, Eric Marchand, and Francois Chaumette, “A real-time tracker for markerless augmented reality,” in ISMAR '03, pp. 36-45, 2003).
Other attempts to implement a marker-less AR system utilize a technique known as structure from motion (SFM), which try to work on a completely unknown scene and use motion analysis and some camera parameters to calculate the pose estimation (e.g., Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse, “MonoSLAM: Real-time single camera SLAM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 6, June 2007; M. Lourakis, A. Argyros, “Efficient, causal camera tracking in unprepared environments,” Computer Vision and Image Understanding, v. 99, p. 259-290, 2005). In these cases, at least some amount of the spatial environment needs to be known either a priori, or in a sequence of video frames, to accomplish the actual range-finding goal.
SFM methods accomplish tracking in parallel to range-finding, as they require suitable tracking to accomplish range-finding in the first place. In addition, some methods still require an initialization stage in which the user must hold up a known target to calibrate the system (e.g., Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse, “MonoSLAM: Real-time single camera SLAM,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 6, June 2007).
Structured light three-dimensional (3D) reconstruction has been well studied in the past few decades due to its wide applications in reverse engineering (S. C. Park and M. Chang, “Reverse engineering with a structured light system,” Computers & Industrial Engineering, vol. 57, no. 4, pp. 1377-1384, 2009), augmented reality (M. Torres, R. Jassel, and Y. Tang, “Augmented reality using spatially multiplexed structured light,” in Mechatronics and Machine Vision in Practice (M2VIP), 2012 19th International Conference, November 2012, pp. 385-390), medical imaging (O. V. Olesen, R. R. Paulsen, L. Hojgaard, B. Roed, and R. Larsen, “Motion tracking for medical imaging: a nonvisible structured light tracking approach,” Medical Imaging, IEEE Transactions on, vol. 31, no. 1, pp. 79-87, 2012) and archaeological finds (S. P. McPherron, T. Gernat, and J.-J. Hublin, “Structured light scanning for high-resolution documentation of iin situ archaeological finds,” Journal of Archaeological Science, vol. 36, no. 1, pp. 19-24, 2009).
In terms of codification strategy, structured light can be classified into four general types: discrete spatial multiplexing, time-multiplexing, continuous frequency multiplexing and continuous spatial multiplexing (J. Salvi, S. Fernandez, T. Pribanic, and X. Llado, “A state of the art in structured light patterns for surface profilometry,” Pattern recognition, vol. 43, no. 8, pp. 2666-2680, 2010). Most of the early work focus on a temporal approach that requires multiple patterns to be projected consecutively onto stationary objects. Obviously, such a requirement makes temporal approaches unsuitable for mobile and real-time applications.
Recently, researchers have devoted much effort to speeding up the data acquisition process by designing techniques that need only a handful of input images or even a single one, so-called one-shot 3D image acquisition.
For example, Zhang et al: (“Rapid shape acquisition using color structured light and multi-pass dynamic programming,” in 3D Data Processing Visualization and Transmission, 2002. Proceedings. First International Symposium on. IEEE, 2002, pp. 24-36) propose a multi-pass dynamic programming algorithm to solve the multiple hypothesis code mating problem and successfully apply to both one-shot and spacetime methods.
Others (A. Ulusoy, F. Calakli, and G. Taubin, “Robust one-shot 3d scanning using loopy belief propagation,” in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, June 2010, pp. 15-22) model a spatial structure light system using a probabilistic graphical formulation with epipolar, coplanarity and topologic constraints. They then solve the correspondence problem by finding a maximum posteriori a loopy belief propagation.
Kawasaki et al: (“Dynamic scene shape reconstruction using a single structured light pattern,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, June 2008, pp. 1-8) use a bicolor grid and local connectivity information to achieve dense shape reconstruction. Given that the proposed technique does not involve encoding positional information into multiple pixels or color spaces, it provides good results even when discontinuities and/or occlusions are present.
Similarly, Chen et al: (“Vision processing for realtime 3-d data acquisition based on coded structured light,” Image Processing, IEEE Transactions on, vol. 17, no. 2, pp. 167-176, February 2008) present a specially-coded vision system where a principle of uniquely color-encoded pattern is proposed to ensure the reconstruction efficiency using local neighborhood information.
To improve color detection, Fechteler and Eisert (P. Fechteler and P. Eisert, “Adaptive color classification for structured light systems,” in Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on, June 2008, pp. 1-7) propose a color classification method where the color classification is made adaptive to the characteristics of the captured image, so distortion due to environment illumination, color cross-talk, and reflectance is well compensated.
In spite of the aforementioned developments, there is still space to further improve one-shot methods, particularly in terms of speed, to be sufficient enough for real-time applications as many existing approaches involve expensive algorithms. For example, As reported in Ulusoy et al., it takes 10 iterations for the method to converge, which costs about 3 minutes to recover thousand intersections. Similarly, the approach in Fechteler and Eisert takes a minute to reconstruct an object with 126544 triangles and 63644 vertices.