People are increasingly doing their shopping on online through electronic marketplaces. Since these people cannot hold, touch, and try on the items they wish to purchase, they tend to do more research to determine whether product they are purchasing is exactly as the think or hope the product to be. Electronic marketplaces, thus, go through extensive procedures to gather and provide such information in a variety of way. Users are accustomed to viewing high resolution images, videos, or animations of product. Images are an effective way to view a product, but they cannot replace the experience of actually holding the product, moving it around to view from different angles, closely looking at a product from various desired angles, for example.
In order to overcome some of these disadvantages, some electronic marketplaces have attempted to provide three-dimensional (3D) models of product. Various types of data and techniques can be used to create 3D models of an object. Each of these types and techniques has their own pros and cons. Most techniques, however, begin with capturing image data with a set of color camera images of the object taken from arbitrary viewpoints. In computer vision literature, techniques such as Structure from Motion (SFM), Visual Simultaneous Localization and Mapping (Visual SLAM), and Bundle Adjustment (BA) match salient points in these images, or image features to simultaneously estimate relative viewpoints of cameras from which the images are taken, along with a sparse structure of the object. Sparse structure however is not suitable to create a photorealistic rendering needed for visualization and interaction. Other techniques augment cameras with 3D time-of-flight sensors (e.g., LIDAR). While such setups can generate high quality 3D models, they require extensive calibration and long capture times.