3D models, such as 3D computer graphics models, are increasingly used in games and entertainment systems, training simulations, to create digital models of virtual museums, libraries, buildings and structures, to map terrain, and to create animated and non-animated objects. Increased demand for 3D content, particularly more complex and realistic 3D models, has led to rapid evolution of systems that create 3D models from real scenes, including models of objects placed in real scenes.
Various techniques have been developed to gather texture and depth information at various scene points by processing data contained in video frames of a scene to create a 3D model of the scene. As an image frame is a two-dimensional (2D) representation of a 3D scene, a point in the image frame does not uniquely determine the location of a corresponding point in a scene. Additional information is required to reconstruct a 3D scene from 2D information. A known technique uses stereoscopic imaging equipment having two cameras to capture a stereo video of a scene. Prior to capturing the stereo video, the cameras are calibrated so that their video can be registered in a common coordinate system. The cameras differ only in the location of their optical centers, which is a function of system design and is therefore known. By triangulating the distance between the location of the camera's optical centers and information about points in frames corresponding to landmark scene points, depth information about the landmark points can be deduced.
In certain conventional techniques for mapping 3D surfaces, a polygon mesh may be generated that approximates the 3D surface of the scene. In this technique, each of the points in frames generates a vertex of a polygon and defines a boundary of the polygon. A 3D “mesh model” is constructed by combining the polygon shapes in a manner that may be considered analogous to piecing together a puzzle where each piece of the puzzle is a polygon shape. The realism and quality of a 3D model obtained by this method depends on the use of two cameras, availability of landmark scene points, and/or different methods of identification of landmark scene points in the images. Essentially, the different methods of identification must process all the pixel data from each frame to identify landmark scene points. Thus, such methods are computationally intensive to implement and require certain pre-conditions, such as availability of landmark scene points, or the like, which may not be desirable.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of described systems with some aspects of the present disclosure, as set forth in the remainder of the present application and with reference to the drawings.