The present invention relates generally to rendering images, and more particularly to blending overlapping images into a single view output image.
The appearance of a scene can be described through all light rays (2D) that are emitted from every 3D point in the scene, generating a 5D radiance function, also called the xe2x80x9cplenopticxe2x80x9d function. In a transparent medium, the plenoptic function is reduced to four dimensions. In practice, the plenoptic function is sampled from discrete calibrated camera views, and those radiance values that are not sampled have to be represented by interpolating the recorded ones, sometimes with additional information on physical restrictions.
Often, an object in the scene is assumed to be Lambertian, meaning that each point on the object has the same radiance value in all possible directions. This implies that two viewing rays have the same color value if they intersect at a surface point. If specular effects occur, this is no longer true. Then, two viewing rays have similar color values only if their directions are similar and their point of intersection is near the real surface point. To construct a new image of the scene for a virtual camera, i.e., an arbitrary view, one has to determine those input rays that are closest, in the above sense, to those of the virtual camera. The closer an input ray is to a desired ray, the greater is its contribution to the output color value.
Image-based rendering (IBR) is a popular alternative to traditional three-dimensional graphics. With IBR, it is possible to generate output images from new views that are not part of the input images. Two examples of effective IBR methods are view-dependent texture mapping (VDTM) and light field/lumigraph approaches. Light field, lumigraph and concentric mosaic require a large collection of input images from multiple cameras, but they make few, if any, assumptions about the geometry of the scene.
In contrast, VDTM assumes a relatively accurate geometric model of the scene, but requires only a small number of images from input cameras that can be at arbitrary locations. Both methods interpolate color values for an output pixel as some weighted combination of input pixels.
In VDTM this interpolation is performed using a geometric proxy model to determine which pixel from each input image xe2x80x9ccorrespondsxe2x80x9d to the desired pixel in the output image. Of these corresponding pixels, those that are closest in angle to the desired pixel are weighted to make the greatest contribution to the interpolated result.
The blending operation ensures that the influence of a single input image on the final rendering is a smoothly varying function across the output image plane, or, equivalently, across the geometry representing the scene. These smooth weighting functions combine to form a xe2x80x9cblending fieldxe2x80x9d that specifies how much contribution the pixels in each input image makes to each pixel in the output image. The reconstructed blending field is then used to blend pixels from the input images to form the pixels of the output image.
The main reasons for blending images are lack of photometric agreement in the input images caused by change in camera response or view-dependent appearance of the scene, and small errors in registration and depth maps, i.e., the geometry of the scene. Blending can cause blurring in the output image. However, blending is necessary to mask the unavoidable errors due to the lack of agreement in the input images.
Blending methods can be in image-space, acting on pixel fragments, or they can be in object-space, handling each polygon in the geometric proxy.
Current blending methods use only local information such as xe2x80x9cdeviations from the closest viewsxe2x80x9d to find blending weights. They include approaches such as view-dependent texture mapping and blending fields used in unstructured lumigraph rendering. Both these and other methods generate smooth spatial and temporal transitions in the output image.
Following are desirable goals for an ideal IBR method. When a desired ray passes through the center of projection of an input camera, it can be trivially reconstructed from the ray database, assuming a sufficiently high-resolution input image and the ray falls within the camera""s field-of-view. In this case, an ideal process should return a ray from the input image. An algorithm with epipole consistency will reconstruct this ray correctly without any geometric information. In general, the choice of which input images are used to reconstruct a desired ray should be based on a natural and consistent measure of closeness. In particular, input image rays with similar angles to the desired ray should be used when possible. When one requests a ray with an infinitesimally small distance from a previous ray intersecting a nearby point on the geometric proxy, the reconstructed ray should have a color value that is correspondingly close to the previously reconstructed color. Reconstruction continuity is important to avoid both temporal and spatial artifacts. For example, the contribution due to any particular camera should fall to zero as one approaches the boundary of its field-of-view, or as one approaches a part of a surface that is not seen by a camera due to visibility occlusions.
As described below, methods based only on local information cannot achieve smoothness across depth boundaries. The VDTM method uses a triangulation of the directions to input cameras to pick the xe2x80x9cclosest three.xe2x80x9d Even if the proxy is highly tessellated, nearby points can have very different triangulations of the xe2x80x9cinput camera view map,xe2x80x9d resulting in very different reconstructions. While this objective is subtle, it is nonetheless important, because lack of such continuity can introduce noticeable artifacts.
Some methods use a very large number of views, and pixels are blended using views that are very similar and captured by the same camera. Such dense sampling avoids most of the artifacts even if the output pixel is blended from a small number of input pixels.
Despite the formalization of the blending problems, the previous IBR methods attempt to solve the problem by considering one-fragment at a time. This only works well when: the surface is diffuse so that radiance is the same in all directions and corresponding pixels have very similar intensities; and there are no occlusion boundaries so that the relative ordering of corresponding pixels in any local neighborhood is the same, resulting in continuous functions without gaps.
Spatial smoothness relates to variation of weights of input images within the output image. Neighboring pixels in the output image should have similar weights if there is no depth boundary. A depth boundary is defined as an area wherein the depth gradient is very large, for example a discontinuity between a foreground and background object in the scene. Temporal smoothness relates to variation of weights of input images at a 3D feature point in the scene in nearby novel views. The weights at a scene feature should change smoothly if the views change smoothly. The guidelines for achieving spatial and temporal smoothness of contributing weights can be stated as follows:
The sum of the intensity weights of corresponding input image pixels is one so that the intensities in the output image are normalized. The weights of an input image along a physical surface should change smoothly in and near overlaps so that the inter-camera intensity differences do not create visible discontinuity in the output image. The distribution of intensity weights within an input image should be smooth if there is no depth discontinuity. There should be epipole consistency and minimal angle deviation. To reduce blurring, unnecessary blending should be avoided by limiting the number of transition regions.
These guidelines suggest solving the blending problem without violating the weight constraints at depth discontinuities and shadow boundaries. It is important to note that, under certain conditions, some of the guidelines may be in conflict and it may not be possible to satisfy all of them. For example, when the boundaries between overlap regions and non-overlap regions meet at a singularity, the blending weights in the local neighborhood of the singularity are not continuous.
Therefore, there is a need for a method that can blend images that does not have the problems stated above.
A method blends multiple input images into an output image for any arbitrary view. In the output images, pixels that are produced from only a single input pixel are identified. The weight of the single pixels is set to one.
For each remaining pixel in the input images with unassigned weights, distances to an image and a depth boundary are measured, and proportional weights, in a range from zero to one, for these remaining pixels are set proportional to the minimum of the two distances. Then, each input image is rendered to the output image according to the blending fields.