A number of systems and programs are offered on the market for the design, the engineering and the manufacturing of objects. CAD is an acronym for Computer-Aided Design, e.g. it relates to software solutions for designing an object. CAE is an acronym for Computer-Aided Engineering, e.g. it relates to software solutions for simulating the physical behavior of a future product. CAM is an acronym for Computer-Aided Manufacturing, e.g. it relates to software solutions for defining manufacturing processes and operations. In such computer-aided design systems, the graphical user interface plays an important role as regards the efficiency of the technique. These techniques may be embedded within Product Lifecycle Management (PLM) systems. PLM refers to a business strategy that helps companies to share product data, apply common processes, and leverage corporate knowledge for the development of products from conception to the end of their life, across the concept of extended enterprise. The PLM solutions provided by Dassault Systèmes (under the trademarks CATIA, ENOVIA and DELMIA) provide an Engineering Hub, which organizes product engineering knowledge, a Manufacturing Hub, which manages manufacturing engineering knowledge, and an Enterprise Hub which enables enterprise integrations and connections into both the Engineering and Manufacturing Hubs. All together the system delivers an open object model linking products, processes, resources to enable dynamic, knowledge-based product creation and decision support that drives optimized product definition, manufacturing preparation, production and service.
In this framework, the field of computer vision and computer graphics offers technologies which are more and more useful. Indeed, this field has applications to 3D reconstruction, 3D model texturing, virtual reality and all domains where it is necessary to precisely build a 3D scene with exact geometry using as input, for example, the information in a one or more photographs. 3D reconstruction can be used in any field which involves the creation of (e.g. textured) 3D models, such as serious gaming, video games, architecture, archeology, reverse engineering, 3D asset database, or virtual environments.
3D reconstruction from video stream and photograph set analysis is addressed in two different approaches in the state of the art, depending on the type of sensors used for the input data. The first approach uses “receiver” sensors. This notably concerns 3D reconstruction from RGB images analysis. Here, 3D reconstruction is obtained by multi-view analysis of RGB color information contained in each of the image planes. The following papers relate to this approach: “R. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision, Cambridge Univ. Press 2004”, “R. Szeliski: Computer Vision: Algorithms and Applications, Edition Springer 2010”, and “Faugeras: Three-Dimensional Computer Vision: A Geometric viewpoint, MIT Press 1994”. The second approach uses “emitter-receiver” sensors. This notably concerns 3D reconstruction from RGB-Depth images analysis. This kind of sensors gives additional depth data to standard RGB data, and it is depth information that is mainly used in the reconstruction process. The following papers relate to this approach: “Yan Cui et al.: 3D Shape Scanning with a Time-of-Flight Camera, CVPR 2010”, “R S. Izadi et al.: KinectFusion: Real-Time Dense Surface Mapping and Tracking, Symposium ISMAR 2011”, and “R. Newcombe et al.: Live Dense Reconstruction with a Single Moving Camera, IEEE ICCV2011”. Moreover, several academic and industrial players now offer software solutions for 3D reconstruction, by RGB image analysis, such as Acute3D, Autodesk, VisualSFM, or by RGB-Depth analysis, such as ReconstructMe or Microsoft's SDK for Kinect (registered trademarks). Multi-view photogrammetry reconstruction methods use the sole information contained in the image plans of a video sequence (or a series of snapshots) in order to estimate 3D geometry of the scene. The matching of interest points between different ones of the 2D views yields the relative positions of the camera. An optimized triangulation is then used to compute the 3D points corresponding to the matching pair. Depth-map analysis reconstruction methods are based on disparity maps or approximated 3D point clouds. Those disparity maps are obtained using stereovision or structured light (see the ‘Kinect’ device for example) or ‘Time of Flight’ 3D-cameras. These state-of-the-art reconstruction methods then typically output a discrete 3D representation of the real object, most often a 3D mesh. The 3D model derives from the eventual volume closing off the resulting 3D point cloud.
Within this field, 3D reconstruction using only a single view has also been a specific topic of interest, because this specific approach allows a globally easy (and thus fast) process. An idea is to infer the 3D using only a single RGB frame, exploiting several hints such as shading (e.g. the algorithms disclosed in paper “Prados et al, Shape from Shading, in Handbook of Mathematical Models in Computer Vision, 2006”), textures (e.g. so-called “Shape from Texture algorithms”), contour drawing (e.g. so-called “Shape from Silhouette” algorithms). The use of a single depth frame in particular for the reconstruction is a recent topic, thanks to consumer depth sensors having appeared very recently on the market. The goal is to use a single depth frame of an object to build a complete 3D model of this object. Using several depth frames makes the problem much easier, because one can align each frame with the other frames in order to get a complete 3D point cloud of the object, and then use a surface reconstruction algorithm, such as the one disclosed in paper “Michael Kazhdan, Matthew Bolitho, and Hughes Hoppe, Poisson Surface Reconstruction, in Eurographics Symposium on Geometry Processing 2006” or paper “F. Calakli, and G. Taubin, SSD: Smooth Signed Distance Surface Reconstruction, in Pacific Graphics 2011”. But it remains very hard to build a complete 3D model of an object using only a single depth frame, because a depth frame only represents a limited part of the object to reconstruct. This is why one often has to involve manual interactions (e.g. as disclosed in paper “Chen et al, 3-Sweep: Extracting Editable Objects from a Single Photo, in SIGGRAPH ASIA, 2013”) or impose strong constraints on the object to reconstruct in order to infer the complete model of the object. One such constraint can be to impose a limited space of shapes on the object to reconstruct. Paper “Kester Duncan, Sudeep Sarkar, Redwan Alqasemi, and Rajiv Dubey, Multi-scale Superquadric Fitting for Efficient Shape and Pose Recovery of Unknown Objects, in ICRA 2013” for instance discloses fitting a superquadric (defined by five intrinsic parameters) on the point cloud back-projected from the depth map. Because the space of shapes is very limited (five parameters only), it is easy to fit a superquadric on a partial point cloud. These specific parameters define a whole superquadric, and one can thus infer the complete shape of the object using only a partial depth view of the object. Paper “Zheng et al, Interactive Images: Cuboid Proxies for Smart Image Segmentation, in SIGGRAPH, 2012” discloses the use of cuboids instead of superquadrics in order to achieve a similar goal. This idea has been extended, in the context of urban environments reconstructions, using a more evolved space of shapes, defined by the assemblies of simple components (e.g. cubes, bricks, cylinders). A grammar defines the rules allowing to assemble several components to build a plausible shape. The grammar is specific to the context and the kinds of objects one wants to be able to reconstruct. For instance such a parametric space has been applied using a single RGB frame, as disclosed in paper “Panagiotis Koutsourakis, Loïc Simon, Olivier Teboul, Georgios Tziritas, and Nikos Paragios, Single View Reconstruction Using Shape Grammars for Urban Environments, in ICCV 2009” (in the context of urban environments reconstructions). Another idea consists in learning a limited space of shapes, and thus learning the natural constraints of a specific class of objects. When humans see a picture of car, they are able to recognize this car and infer the non-visible part of this car, because a car is a very specific object. Some algorithms leverage this idea, learning the space of shapes for a specific class of objects in order to reconstruct the object using only a single partial view. Papers “Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis, SCAPE: Shape Completion and Animation of People, in SIGGRAPH 2005” and “Oren Frefeld, and Michael J. Black, Lie Bodies: A Manifold Representation of 3D Human Shape, in ECCV 2012” suggest to learn the space of human bodies, and then to reconstruct a complete person with only a single depth frame of this person. A similar idea is used in paper “Yu Chen, and Roberto Cipolla, Single and Sparse View 3D Reconstruction by Learning Shape Priors, in CVIU Journal 2011” to reconstruct some CAD objects using only the silhouette of the object.
Within this context, there is still a need for an improved solution for reconstructing a 3D modeled object that represents a real object.