In the fields of computer graphics and computer vision, generating views of scenes for users has many applications. One or more cameras, or more generally image capture devices, can capture images of a real scene for display to a user. Referring to FIG. 1A, image capture of a real scene, multiple cameras (100, 102, and 104) are used in this example to capture images of a real scene, in this case a section of a city 106. In some applications, the user desires to view the scene from an angle other than the angle from which the original images were captured. A variety of conventional techniques exists for generating views from a new angle, also known as a virtual location, or virtual camera angle.
U.S. Pat. No. 7,286,143 to Kang, et al for Interactive viewpoint video employing viewpoints forming an array is related to the generation and rendering of video, and more particularly to a system and process for generating and rendering an interactive viewpoint video in which a user can watch a dynamic scene while changing the viewpoint at will. This patent teaches using static cameras to capture visible images, and a user can select a viewpoint from which a view is rendered of the original scene.
U.S. Pat. No. 7,471,292 to Li for Virtual view specification and synthesis in free viewpoint is a system that receives a first video stream of a scene having a first viewpoint and a second video stream having a second viewpoint wherein camera calibration between the first viewpoint and the second viewpoint is unknown. A viewer selects a viewpoint generally between the first viewpoint and the second viewpoint, and the system synthesizes a view from the selected viewpoint based upon the first video stream and the second video stream. This patent teaches a framework for the rendering problem in free viewpoint television (FTV) based on image-based rendering (IBR) and generates a view from the video streams without the use of a model.
One of the challenges in generating new views from images is accurately rendering the objects in the view from a new (virtual) camera angle. Using a three-dimensional model of the scene to be rendered is a known method for improving view generation. Techniques for using a three-dimensional model to facilitate the generation of views of a scene are known in the industry. See for example, U.S. Pat. No. 5,850,352 to Saied Moezzi et al for Immersive video, including video hypermosaicing to generate from multiple video views of a scene a three-dimensional video mosaic from which diverse virtual video scene images are synthesized including panoramic, scene interactive and stereoscopic images. This patent teaches synthesizing diverse spatially and temporally coherent and consistent virtual video cameras, and corresponding virtual video images, from multiple real video images that are obtained by multiple real video cameras. Video images of selected objects are tracked in a three-dimensional model of the scene for synthesizing the virtual video images. A user can select both a viewpoint (for example location) and type of display (for example panoramic or stereoscopic) virtual video.
One of the challenges of rendering a view from a three-dimensional model is texture mapping. Each surface, or polygon, or a three dimensional model needs to be rendered in some level of detail, depending on the application. A basic technique for static texture mapping is to use one of the original images as a texture source. Referring to FIG. 1B, an example of image to model registration, the original image is registered to the three-dimensional model. For each surface of the three-dimensional model to be rendered, a portion of the image corresponding to the surface is used to render the texture for that surface. A limitation of static texture mapping can be seen from the example of areas where there are unmodelled geometry, such as trees, satellite dishes, and projections from buildings. In cases such as these, static texture mapping smears the texture on the plane in the background of the unmodelled geometry. Refer to FIG. 1C, a rendering using static texture mapping, for an example showing smearing of textures.
Unlike standard texture mapping, where the texture is “pasted” on the 3D surface, a technique known as view dependant texture mapping (VDTM) is used to create a photorealistic view based on real images (photographs) of a scene and a three-dimensional model of the scene. This technique uses texture projection to map the texture for each surface and selects the texture based on the viewpoint and other criteria. Texture projection is a technique that uses the camera model of the photograph (commonly a pinhole camera model) to create a projective transformation that is then used to project/transform a world coordinate to an image/texture coordinate. The main difference between standard texture mapping and view dependent texture projection is that standard texture mapping is static, meaning that for every fragment the texture is predefined before rendering, irrespective of the viewpoint. In View Dependent Texture Projection Mapping, the texture is chosen on the fly for each surface, or portion of a surface, and is done during rendering based on a number of heuristics that take the viewpoint into account. This technique reduces the smearing effect where there is unmodelled geometry on a surface. Refer to FIG. 1D, a rendering using VDTM, for an example showing improvement in texture mapping over static texture mapping.
Refer to the research thesis View-Dependent Texture Projection Mapping for Urban Scenes by Amit Ben-David, Technion—Israel Institute of Technology, Haifa, July 2009 for further background information and descriptions of implementations of techniques mentioned in this document.
While conventional methods and systems allow selection of viewpoints and display types, for military planning, military operations, and similar applications, it is desirable to have additional view selection criteria. In particular, given a set of view selection criteria, it is desirable to provide a view generated from multiple media and temporal sources that gives the best result for the given view selection criteria.