A number of systems and programs are offered on the market for the design, the engineering and the manufacturing of objects. CAD is an acronym for Computer-Aided Design, e.g. it relates to software solutions for designing an object. CAE is an acronym for Computer-Aided Engineering, e.g. it relates to software solutions for simulating the physical behavior of a future product. CAM is an acronym for Computer-Aided Manufacturing, e.g. it relates to software solutions for defining manufacturing processes and operations. In such computer-aided design systems, the graphical user interface plays an important role as regards the efficiency of the technique. These techniques may be embedded within Product Lifecycle Management (PLM) systems. PLM refers to a business strategy that helps companies to share product data, apply common processes, and leverage corporate knowledge for the development of products from conception to the end of their life, across the concept of extended enterprise.
The PLM solutions provided by Dassault Systemes (under the trademarks CATIA, ENOVIA and DELMIA) provide an Engineering Hub, which organizes product engineering knowledge, a Manufacturing Hub, which manages manufacturing engineering knowledge, and an Enterprise Hub which enables enterprise integrations and connections into both the Engineering and Manufacturing Hubs. All together the system delivers an open object model linking products, processes, resources to enable dynamic, knowledge-based product creation and decision support that drives optimized product definition, manufacturing preparation, production and service.
In this context, the field of computer vision and computer graphics offers technologies which are more and more useful. Indeed, this field has applications to 3D reconstruction, and all domains where it is necessary to precisely build a 3D scene with exact geometry using as input, for example, the information in a set of photographs. 3D reconstruction from video stream and photographs set analysis is addressed in two different approaches in the state of the art, depending on the type of sensors used for the input data.
The first approach uses “receiver” sensors. This notably concerns 3D reconstruction from RGB images analysis. Here, 3D reconstruction is obtained by multi-view analysis of RGB color information contained in each of the image planes. The following papers relate to this approach:
R. Hartley and A. Zisserman: Multiple View Geometry in Computer Vision, Cambridge Univ. Press 2004;
R. Szeliski: Computer Vision: Algorithms and Applications, Edition Springer 2010; and
O. Faugeras: Three-Dimensional Computer Vision: A Geometric viewpoint, MIT Press 1994.
The second approach uses “emitter-receiver” sensors. This notably concerns 3D reconstruction from RGB-Depth images analysis. This kind of sensors gives additional depth data to standard RGB data, and it is depth information that is mainly used in the reconstruction process. The following papers relate to this approach:
Yan Cui et al.: 3D Shape Scanning with a Time-of-Flight Camera, CVPR 2010;
R S. Izadi et al.: KinectFusion: Real-Time Dense Surface Mapping and Tracking, Symposium ISMAR 2011; and
R. Newcombe et al.: Live Dense Reconstruction with a Single Moving Camera, IEEE ICCV2011.
Moreover, several academic and industrial players now offer software solutions for 3D reconstruction, by RGB image analysis, such as Acute3D, Autodesk, VisualSFM, or by RGB-Depth analysis, such as ReconstructMe or Microsoft's SDK for Kinect (registered trademarks).
Multi-view photogrammetry reconstruction methods use the sole information contained in the image plans of a video sequence (or a series of snapshots) in order to estimate 3D geometry of the scene. The matching of interest points between different ones of the 2D views yields the relative positions of the camera. An optimized triangulation is then used to compute the 3D points corresponding to the matching pair. Depth-map analysis reconstruction methods are based on disparity maps or approximated 3D point clouds. Those disparity maps are obtained using stereovision or structured light (see the ‘Kinect’ device for example) or ‘Time of Flight’ 3D-cameras. These state-of-the-art reconstruction methods then typically output a discrete 3D representation of the real object, most often a 3D mesh. The 3D model derives from the in fine volume closing off the resulting 3D point cloud.
A further step known from the prior art is to produce a texture for each polygon on the 3D mesh. In order to ensure photo-realism, prior art requires that the rendering use standard images from high-quality devices capturing the scene simultaneously. This is explained in the paper by T. Hanusch, A new texture mapping algorithm for photorealistic reconstruction of 3D objects, in ISPRS journal of photogrammetry and remote sensing.
FIG. 1 illustrates a common approach used to texture a 3D model with a photograph, which is the well-known projective texture mapping method. This method is described for example in the paper by P. Debevec, C. Taylor and J. Malik, Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach, in SIGGRAPH 1996. This method uses image projection data associated to a 2D view (relative to the 3D model) to compute the mapping to the 3D model. FIG. 1 shows such a view-dependent 3D model texturing principle for 3D meshed model 102 and calibrated image 104: a projection texture mapping (represented by bundle 106, computed from camera projection matrix and departing from optical center 108) is used to estimate the texture coordinate for each triangle vertex.
Now, as illustrated on FIG. 2, the texturing quality by projection onto the 3D model is highly dependent on camera pose estimation. Indeed, FIG. 2 illustrates the 3D model texturing problematic: on the left, accurate calibration data allows coherent texturing 104 by projection on 3D model 102, whereas, on the right, inaccurate calibration data induces a drift in the projection of texturing 104 relative to 3D model 102. In other words, the estimation of camera rotation and translation at the time of the snapshot has a high impact on the final texturing. Obviously, any bias on the camera pose translates onto the re-projection and deteriorates the texturing process. Such a bias is usual particularly significant in the case of depth-map analysis methods. It generally originates from a shift in synchronizing between the depth sensor and the RGB sensor, corrupting the camera trajectory estimation. But it may also originate from an outside shot from an independent camera whose relative position to the 3D model cannot be estimated with sufficient accuracy because there is no rigid dependency to the depth sensor.
Within this context, there is still a need for an improved solution for designing a 3D modeled object representing a real object.