3D models are used by computer graphics programs in games and entertainment, training simulations, to create virtual museums, libraries, buildings and structures, to map terrain, and to create animated and non-animated objects. Increased demand for 3D content for use by these programs, particularly more complex and realistic 3D models, has led to rapid evolution of systems that create 3D models directly from real scenes, including models of objects placed in real scenes. Although hardware continues to improve, and the cost of both hardware and software continues to decrease, these performance and economic improvements have not sufficed to overcome the difficulty involved in synthesizing realistic 3D models with portable and inexpensive equipment.
Various techniques have been developed to gather texture and depth information at various scene points by processing data contained in video frames of a scene to create a 3D model of the scene. Because a frame is a two-dimensional (2D) representation of a 3D scene, a point in a frame does not uniquely determine the location of a corresponding point in a scene. Additional information is required to reconstruct a scene in 3D from 2D information. A known technique uses stereoscopic imaging equipment having two cameras to capture a stereo video of a scene. Prior to capturing the stereo video, the cameras are calibrated so that their video can be registered in a common coordinate system. The cameras differ only in the location of their optical centers, which is a function of system design and is therefore known. By triangulating the distance between the location of the camera's optical centers and information about points in frames corresponding to landmark scene points, depth information about the landmark points can be deduced.
In a conventional technique for mapping 3D surfaces, a polygon mesh is generated that approximates the 3D surface of the scene. In this technique, each of the points in frames generates a vertex of a polygon and defines a boundary of the polygon. A 3D “mesh model” is constructed by combining the polygon shapes in a manner analogous to piecing together a puzzle where each piece of the puzzle is a polygon shape. The realism and quality of a 3D model obtained by this method depends on the use of two cameras, availability of landmark scene points, and use of computer algorithms to identify landmark scene points in the images. Essentially, the algorithms must process all the pixel data from each frame to identify landmark scene points. Clearly then, this method is computationally costly to implement.
Imaging of complex scenes sometimes causes the registration of multiple frames to fail. Failure can be caused by many factors, such as, for example, the loss of frame points due to camera or scene movement. When video processing occurs after video is captured, the registration failure may not be easily or optimally corrected. However, if video capturing and processing occurred near simultaneously, registration failures could be corrected in real-time.
It is therefore desirable, to provide a system capable creating realistic 3D models of scenes quickly and cost effectively.
The present invention includes an apparatus, referred to herein as the ModelCamera, and a method for creating 3D models. The ModelCamera includes a video camera, a plurality of light beams, a frame for fixedly connecting the camera and the plurality of light beams in a fixed relationship, and a computer, the computer being capable of transferring data to and from the video camera.
In a preferred embodiment, the ModelCamera is configured to simultaneously video a scene and produce an evolving 3D model.
In another preferred embodiment, the ModelCamera is mounted on a bracket that allows it to pan and tilt around its center of projection.
In another embodiment, a method for modeling a structured 3D scene is disclosed. The method includes the steps of capturing video frames of a scene, identifying and characterizing a plurality of image dots in the video frames, obtaining color information from a multitude of pixels in each frame, registering each additional frame with respect to the previous frame using dot depth and pixel color information, adding new or better information, eliminating redundant information, and constructing a 3D model.
In another embodiment, the 3D model is an evolving 3D model obtained by interactively capturing scenes. Interactivity is achieved by constructing the 3D model in real-time, observing the model's evolution, identifying areas in the model where additional definition is desirable, optionally capturing additional video frames corresponding to the area where additional definition is desirable, and merging information obtained from the additional video with the evolving model to enhance the model.