An image of a scene can be captured from the viewpoint of a camera. In some cases there may be more than one camera capturing different images of a scene. Each image of the scene represents a view of the scene from the viewpoint of the respective camera. However, there will be some viewpoints of the scene which do not correspond to any of the camera viewpoints. The image may be a frame of a video sequence. Techniques such as Free-Viewpoint Video Rendering (FVVR) allow an image representing a novel view of a scene to be generated based on a set of multiple views of the scene from multiple camera viewpoints. The cameras are preferably calibrated and synchronized with each other to facilitate inferring intermediate images of the scene.
Based on the different images of the scene, a model of the scene geometry may be constructed, for example using Multiple-View Stereo (MVS), and a texture may be formed which can be applied to the model. The texture can be formed by projectively texturing the scene geometry with the original images and blending the projected images. The model, with the texture, can then be used to render the scene from a rendering viewpoint which may, or may not, be the same as one of the camera viewpoints. As well as recreating a “real-world” scene from a rendering viewpoint, the content of the real-world scene may be rendered alongside other scene content, either computer generated or real-world.
The term “geometry” is used in the art, and herein, to refer to computer-generated representations of the surfaces of objects in the scene, such that the geometry allows the shape, size and location of objects in the scene to be modelled. The geometry can be textured to thereby apply textures (e.g. defining a colour and other surface detail) to the geometry in order to represent the appearance of objects in the scene. Geometry reconstructed from multiple images of a real scene may be referred to as a “proxy” or “geometric proxy” herein. The geometry is often a triangle mesh, although other representations such as point clouds are possible.
There are a number of issues which may need to be considered when generating a novel viewpoint of a scene, particularly when integrating content into surroundings that differ to those at capture. For example, relighting of the scene can be difficult. Textures extracted from images (e.g. frames of a video sequence) captured by cameras have implicit real-world lighting information, such that lighting artefacts are present (i.e. “baked-in”) in the textures.
One way of addressing the problem of how to relight the textures for a novel viewpoint is to control the lighting of the scene at the time when the cameras capture the different views of the scene. For example, diffuse lighting can be used in the initial video capture to avoid creating excess shaded areas and specularities that will damage the plausibility of the scenes rendered using extracted textures. The effects of changes in lighting may be reproduced by estimating the material properties of the textures, for example the intrinsic colour (albedo) and fine detail (surface normals), for subsequent relighting using conventional computer graphics techniques. This may be addressed using an active lighting (or “light-stage”) arrangement, in which images of the scene are captured under a variety of calibrated lighting conditions, with material properties of the textures (such as the intrinsic colour, or “albedo”, and the fine detail of the surfaces) being fitted to the images. However, the method requires costly apparatus and is generally limited to static scenes. Relighting scenes with arbitrary lighting arrangements is considerably more challenging.