The present invention relates to the field of image-based rendering, that is the processing of data defining pre-acquired images (real or synthetic, static or dynamic) to synthesise a new image from a desired viewpoint without relying upon a geometric model of the subject.
Images such as photographs, television pictures, video pictures etc provide a two-dimensional view of a scene from only predetermined viewpoints determined by the positions of the cameras. However, it is often desirable to view the scene from a different viewing position/orientation, and accordingly a number of techniques have been developed for this.
In one approach, known as xe2x80x9cmodel-based renderingxe2x80x9d a geometric model of the subject is created using geometric primitives such as polygons, and the model is then rendered from a desired viewing position and orientation taking into account reflectance properties of the surface of the subject and parameters defining the position and characteristics of light sources.
Such an approach suffers from many problems, however, and in particular the time and processing resources necessary to define the geometric model, surface reflectances and light sources sufficiently well that a realistic output image can be achieved.
As a result, a number of xe2x80x9cimage-based renderingxe2x80x9d techniques have been developed which can generate an image from a viewing position/orientation different to those of the start images without using a geometric model of the subject.
For example, techniques based on interpolating the positions and colours of pixels in two images have been proposed to generate intermediate views, such as in xe2x80x9cView Morphingxe2x80x9d by Seitz and Dyer in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 21-30. However, the intermediate views are only generated for a viewpoint on the line connecting the two viewpoints of the original images.
An image-based rendering technique which allows an image to be generated from an arbitrary viewing position/orientation is disclosed in xe2x80x9cLight Field Renderingxe2x80x9d by Levoy and Hanrahan in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 31-42, in which a four-dimensional light field defining radiance as a function of position and direction is generated. This function characterises the flow of light through unobstructed space in a static scene with fixed illumination. Generating a new image is done by calculating a slice of the light field in two-dimensions. However, the number of input images required and the time and processing resources necessary to perform this technique are considerable.
xe2x80x9cThe Lumigraphxe2x80x9d by Gortler et al in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1996, pages 43-54 discloses a technique in which a simplified light field function is calculated by considering only light rays leaving points on a convex surface that encloses the object. In this technique, however, images can be synthesised only from viewpoints exterior to the convex hull of the object being modelled, and the number of input images required and the processing time and effort is still very high.
A further image-based rendering technique is described in xe2x80x9cMultiple-Centre-of-Projection Imagesxe2x80x9d by Rademacher and Bishop in SIGGRAPH Computer Graphics Proceedings, Annual Conference Series, 1998, pages 199-206. In this technique a multiple-centre-of-projection image of a scene is acquired, that is, a single two-dimensional image and a parameterised set of cameras meeting the conditions that (1) the cameras must lie on either a continuous curve or a continuous surface, (2) each pixel is acquired by a single camera, (3) viewing rays vary continuously across neighbouring pixels, and (4) two neighbouring pixels must either correspond to the same camera or to neighbouring cameras. In practice, the required multiple-centre-of-projection image is acquired by translating a one-dimensional CCD camera along a path so that one-dimensional image-strips are captured at discrete points on the path and concatenated into the image buffer. However, the scene must be static to prevent mismatched data as every image-strip is captured at a different time. To render an image of the scene from a new viewpoint, the reprojected location in world-space of each pixel from the multiple-centre-of-projection image is computed, and the reprojected points are then rendered to reconstruct a conventional range image from the new viewpoint. To perform the rendering, a splatting technique is proposed, which consists of directly rendering each point using a variable-size reconstruction kernel (e.g. a Gaussian blob), for example as described in xe2x80x9cAn Anti-Aliasing Technique for Splattingxe2x80x9d by Swan et al in Proceedings IEEE Visualization 1997, pages 197-204. This technique suffers, inter alia, from the problem that a multiple-centre-of-projection image is required as input.
A number of hybrid approaches, which combine model-based rendering and image-based rendering, have been proposed.
For example, xe2x80x9cView-based Rendering: Visualizing Real Objects from Scanned Range and Color Dataxe2x80x9d by Pulli et al in Proceedings Eurographics 8th Workshop on Rendering, June 1997, pages 23-34, discloses a technique in which a partial geometric model comprising a triangle mesh is interactively created for each input image which originates from a different viewpoint. To synthesize an image from a new viewpoint, the partial models generated from input images at three viewpoints close to the new viewpoint are rendered separately and combined using a pixel-based weighting algorithm to give the synthesised image.
xe2x80x9cConstructing Virtual Worlds Using Dense Stereoxe2x80x9d by Narayanan and Kanade in Proceedings 6th ICCV, 1998, pages 3-10, discloses a hybrid technique in which the intensity image and depth map for each camera view at each instant in time is processed to generate a respective textured polygon model for each camera, representing the scene visible to that camera. To generate an image for a user-given viewpoint, the polygon model which was generated from the camera closest to the user viewpoint (a so-called xe2x80x9creferencexe2x80x9d camera) is rendered and holes in the resulting rendered view are filled by rendering the polygon models which were generated from two camera neighbouring the reference camera. If any holes still remain, they are filled by interpolating pixel values from nearby filled pixels. Alternatively, a global polygon model of the whole scene can be constructed and rendered from the desired viewpoint.
In both of the hybrid techniques described above, a large number of closely-spaced cameras is required to provide the input data unless the viewpoints from which a new image can be generated are severely restricted and/or a degraded quality of generated image is accepted. This is because a partial geometric model must be available from each of a number of cameras that are close to the viewpoint from which the new image is to be rendered. For example, in the technique described in xe2x80x9cConstructing Virtual Worlds Using Dense Stereoxe2x80x9d, 51 cameras are mounted on a 5 meter geodesic dome to record a subject within the dome. In addition, processing time and resource requirements are increased due to the requirement to generate at least partial geometric models.
The present invention has been made with the above problems in mind.
According to the present invention, there is provided an image-based rendering method or apparatus, in which, to generate a value for a pixel in a virtual image from a user-defined viewpoint, input depth map images are tested to identify the pixel or pixels therein which represent the part of the scene potentially visible to the pixel in the virtual image, and a value for the pixel in the virtual image is calculated based on the pixel(s) which represent the part of the scene closest to the virtual image.
Preferably, a Z-buffer is used to maintain pixel values for the virtual image, which is updated as the input depth map images are tested if the pixel or pixels identified from a depth map image represent a part of the scene closer to the virtual image than the part represented by the value for the virtual pixel already stored in the Z-buffer.
The invention also provides an image-based rendering method or apparatus for processing depth map images to generate pixel values for an image from a different viewpoint, in which a pixel value is calculated by defining a viewing ray through the pixel, and testing the depth map images using the viewing ray to identify the pixel or pixels in the depth map images which represent the part of the scene which can actually be seen by the pixel, and by calculating a value for the pixel in dependence upon the identified pixel or pixels.