1. Technical Field
The invention is related to inverse texture mapping, and more particularly, to a system and process for inverse texture mapping using weighted pyramid blending and view-dependent weight maps.
2. Background Art
Image-based modeling and rendering has become an important and powerful method to simulate visually rich virtual environments. A traditional method to represent such an environment is to use 3D geometry with some associated texture maps. Many techniques for recovering the 3D geometry of the environment from one or more images have been developed in the photogrammetry, computer vision, and computer graphics communities. These include both automatic techniques [Fau93] and interactive techniques [DTM96].
High quality texture maps are essential for simulating visually rich natural environments. Panoramic mosaics and environment maps [Che95, M1395, and SS97] are examples of rich texture maps used with very simple geometric models (cylinders or cubes).
A number of blending algorithms have been developed to extract texture maps and/or high quality mosaics from multiple images. A general requirement for blending two images is to make the "seams" between them invisible. One common technique is to use linear or nonlinear cross-dissolving or weighted averaging [Pel8l, Sze96]. However, choosing a proper cross-dissolving width is a difficult task. Burt and Adelson [BA83b] use a Laplacian pyramid to blend two images in different frequency bands with different blending widths. They can therefore blend without blurring by cross-dissolving in narrower neighborhoods in higher frequency bands, and in wider neighborhoods in lower frequency bands.
Laplacian pyramid blending works very well for splining two overlapping images (joining two images at the centerline without visible seams) and has been widely adopted (e.g., for blending facial masks with face images [BCS97]). However, it cannot be directly applied to the inverse texture mapping problem because it is not designed for images with alpha channels (visibility masks).
Images with alpha channels, typically referred to as alpha-masked images, are widely used in many applications such as image compositing [PD84, Bli94a, Bli94b] and blue-screen matting [SB96]. In the context of inverse texture mapping, images having cutouts due to occlusions, moving objects, and irregular shapes from perspective projection, must often be blended. Alpha-masking the images provides a way to handle such images with cutouts. Specifically, the pixels in the cut-out regions of the images are given zero alpha values. However, as eluded to above, current blending methods are not capable of blending alpha masked images.
In addition, when images are taken from different view points or at different scales (i.e., zoom), they will have very different spatial samplings when projected into the texture map, and therefore contribute content at different spatial frequencies. Images to be blended may also have different exposures. Before such images can be blended, a decision must be made as to how much each image (in fact, each pixel in each image) should contribute to the final, blended texture map.
FIG. 1 illustrates the two foregoing problems in inverse texture mapping. Assume it is desired to create a texture map 10 of the side wall 12 of the building 14 using two images 16, 18. However, the view of the side wall 12 captured in the first image 16 is partially occluded by a tree 20. Similarly, the view of the side wall 12 capture in the second image 18 is partially occluded by another tree 22. In order to obtain a texture map of the side wall 12, the part of the two images 16, 18 depicting the trees must be "cut-out" prior to blending. As discussed above, this can be accomplished using an alpha mask and setting those pixels associated with the trees to zero. One of the problems addressed by the present invention is how to blend images with these cut-out regions (i.e., with alpha masks). Another problem addressed by the present invention is how to blend images having different sampling rates. This is also illustrated in FIG. 1. As can be seen the view of the side wall 12 captured in the first image 16 is directly in front of the wall, whereas the view of the side wall captured in the second image 18 is off to the side of the wall. This means that the first image will provide a better sampling rate of the pixels associated with the side wall 12 than will the second image in the context of image viewpoint. However, it can also be seen that the second image 18 was captured at a greater zoom setting than the first, and so may show more detail (i.e., a higher sampling rate) of the side wall 12 in some areas. Thus, the resulting blended image would be more accurate if the contributions of corresponding pixels in the two images 16, 18 where weighted relative to their sampling rates (and degree of exposure as well) prior to blending. The issue of how to adjust the blending coefficients to account for these differences is addressed by the present invention so that optimal sampling and exposure can be maintained in the blended image.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by an alphanumeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, "reference [DTM96]" or simply "[DTM96]". Multiple references will be identified by a pair of brackets containing more than one designator, for example, [PD84, Bli94a, Bli94b]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.