A number of systems and programs are offered on the market for the design, the engineering and the manufacturing of objects. CAD is an acronym for Computer-Aided Design, e.g. it relates to software solutions for designing an object. CAE is an acronym for Computer-Aided Engineering, e.g. it relates to software solutions for simulating the physical behavior of a future product. CAM is an acronym for Computer-Aided Manufacturing, e.g. it relates to software solutions for defining manufacturing processes and operations. In such computer-aided design systems, the graphical user interface plays an important role as regards the efficiency of the technique. These techniques may be embedded within Product Lifecycle Management (PLM) systems. PLM refers to a business strategy that helps companies to share product data, apply common processes, and leverage corporate knowledge for the development of products from conception to the end of their life, across the concept of extended enterprise. The PLM solutions provided by Dassault Systèmes (under the trademarks CATIA, ENOVIA and DELMIA) provide an Engineering Hub, which organizes product engineering knowledge, a Manufacturing Hub, which manages manufacturing engineering knowledge, and an Enterprise Hub which enables enterprise integrations and connections into both the Engineering and Manufacturing Hubs. All together the system delivers an open object model linking products, processes, resources to enable dynamic, knowledge-based product creation and decision support that drives optimized product definition, manufacturing preparation, production and service.
In this framework, the field of computer vision and computer graphics offers technologies which are more and more useful. Indeed, this field has applications to 3D reconstruction. 3D reconstruction can be used in any field which involves the creation of (e.g. textured) 3D models, such as serious gaming, video games, architecture, archeology, reverse engineering, 3D asset database, or virtual environments. Several academic and industrial players now offer software solutions for 3D reconstruction, for example by RGB and/or depth image analysis, such as Acute 3D, Autodesk, VisualSFM, or by RGB-Depth analysis, such as ReconstructMe or Microsoft's SDK for Kinect (registered trademarks).
RGB-Depth (or RGB-D) image analysis is an approach to 3D reconstruction that uses “emitter-receiver” sensors which provide depth data in addition to standard RGB data. Depth data may constitute the data mainly used in the reconstruction process. The following papers relate to this approach: “Yan Cui et al.: 3D Shape Scanning with a Time-of-Flight Camera, CVPR 2010”, “R S. Izadi et al.: KinectFusion: Real-Time Dense Surface Mapping and Tracking, Symposium ISMAR 2011”, and “R. Newcombe et al.: Live Dense Reconstruction with a Single Moving Camera, IEEE ICCV2011”. Depth-map analysis reconstruction methods are based on disparity maps or approximated 3D point clouds. Those disparity maps are obtained using stereovision or structured light (see the ‘Kinect’ device for example) or ‘Time of Flight’ 3D-cameras.
RGB-D image analysis may notably be used in a process of 3D reconstruction of a real object, such as a human body. Starting from an RGB-D acquisition of a user (i.e. color image with a depth map image), the aim of such a process is to predict the exact 3D shape and/or pose of the user's body. This has many applications in virtual try-on simulation, augmented reality, internet of things and video games, where the user's body shape and pose are captured e.g. with a single RGB-D sensor.
Most of the 3D human body reconstruction literature can be summarized into a single class of methods that can be referred to as “optimization-based methods”. These methods estimate the 3D human body parameters, pose and shape, using optimization technics applied to a cost function that compares a view of the user to a 3D parametric body model.
The following lists papers that disclose examples of optimization-based methods:                Balan, L. S. Detailed Human Shape and Pose from Images. CVPR, 2007.        Balan, M. B. The Naked Truth: Estimating Body Shape Under Clothing. ECCV, 2008.        A. Weiss, D. H. Home 3D body scans from noisy image and range data. ICCV, 2011.        F. Perbet, S. J. Human Body Shape Estimation Using Multi-Resolution Manifold Forest. CVPR, 2014.        M. Loper, M. B. OpenDR: An approximate Differentiable Renderer. ECCV, 201.4        P. Guan, A. W. Estimating human shape and pose from a single image. ICCV, 2009.        Y. Chen, Z. L. Tensor-based Human Body Modeling. CVPR, 2013.        
As stated in these papers, optimization-based methods start by capturing the user using an RGB-D camera. This delivers an RGB image and a depth map image. The method then represents the 3D human body with a parametric model controlled by shape and pose parameters. The shape parameters capture the intrinsic shape across people while the pose parameters capture the body pose.
Using this parametric model with the user RGB-D acquisition, optimization-based methods often predict the model parameters using two optimization steps.
The first step consists in searching for the body parameters (shape and pose) by matching the parametric body model's silhouette to the observed one (extracted from the user depth map).
The two silhouettes are compared using a bidirectional cost, defined for example:
            E      silhouette        =                  d        ⁡                  (                      S            ⟶            T                    )                    +              d        ⁡                  (                      T            ⟶            S                    )                                d      ⁡              (                  S          ⟶          T                )              =                            ∑                      i            ,            j                          ⁢                              S            ij                    ·                                    C              ij                        ⁡                          (              T              )                                                            ∑                      i            ,            j                          ⁢                                  ⁢                  S          ij                    
Where S is the user silhouette and T is the model silhouette and:                Sij=1 if the pixel of index (i, j) is inside S, otherwise 0.        Cij(T)=distance of a pixel (i, j) to the nearest pixel in S if pixel (i, j) not in T, otherwise 0.        
The second step consists in matching both the body parameters and the observed depth map by minimizing a cost function of both maps. The cost function is defined as the distance between the overlapped two depth maps, as defined below:
      E    depth    =            1      N        ⁢                  ∑        pixels            ⁢                          ⁢              ρ        ⁡                  (                                    D              S                        -                          D              T                                )                    
Where DS is the user depth map, DT is the model depth map, ρ is the Geman-McClure estimator and N is the number of overlapped pixels.
Another optimization-based method is proposed by above-cited “F. Perbet, S. J. Human Body Shape Estimation Using Multi-Resolution Manifold Forest. CVPR, 2014”. This method searches only the shape parameters and formulates the task of shape estimation as of optimizing an energy function over the manifold of human body shapes. Starting from a single human depth map, an initial solution is found on the manifold using a similarity measure. An Iterative Closest Point is then used to refine the solution.
3D human body reconstruction using optimization-based reconstruction methods suffer from different drawbacks. One relates to the low convergence speed. For example, as stated in above-cited “A. Weiss, D. H. Home 3D body scans from noisy image and range data. ICCV, 2011”, the method may take more than forty-five minutes to converge. This is due to the large number of unknowns and the complexity of the objective function (which is not differentiable for most cases). Also, optimization methods can be stuck into a local minimum, because the optimization objective function employed is not convex. A common strategy to bypass the local minimum problem is to alternate the optimization between pose and shape parameters, by splitting the optimization to several optimization problems with different unknowns for each one. This is proposed notably by above-cited “Balan, M. B. The Naked Truth: Estimating Body Shape Under Clothing. ECCV, 2008”. This avoids optimizing a large vector of unknowns and helps the convergence, but it still takes an important time and the convergence is not guaranteed.
Within this context, there is still a need for an improved solution for reconstructing a 3D modeled object that represents a real object from a depth map.