1. Technical Field
The invention is related to inverse texture mapping, and more particularly, to a system and process for inverse texture mapping using weighted pyramid blending.
2. Background Art
Image-based modeling and rendering has become an important and powerful method to simulate visually rich virtual environments. A traditional method to represent such an environment is to use 3D geometry with some associated texture maps. Many techniques for recovering the 3D geometry of the environment from one or more images have been developed in the photogrammetry, computer vision, and computer graphics communities. These include both automatic techniques [Fau93] and interactive techniques [DTM96].
High quality texture maps are essential for simulating visually rich natural environments. Panoramic mosaics and environment maps [Che95, M1395, and SS97] are examples of rich texture maps used with very simple geometric models (cylinders or cubes).
A number of blending algorithms have been developed to extract texture maps and/or high quality mosaics from multiple images. A general requirement for blending two images is to make the xe2x80x9cseamsxe2x80x9d between them invisible. One common technique is to use linear or nonlinear cross-dissolving or weighted averaging [Pe181, Sze96]. However, choosing a proper cross-dissolving width is a difficult task. Burt and Adelson [BA83b] use a Laplacian pyramid to blend two images in different frequency bands with different blending widths. They can therefore blend without blurring by cross-dissolving in narrower neighborhoods in higher frequency bands, and in wider neighborhoods in lower frequency bands.
Laplacian pyramid blending works very well for splining two overlapping images (joining two images at the centerline without visible seams) and has been widely adopted (e.g., for blending facial masks with face images [BCS97]). However, it cannot be directly applied to the inverse texture mapping problem because it is not designed for images with alpha channels (visibility masks).
Images with alpha channels, typically referred to as alpha-masked images, are widely used in many applications such as image compositing [PD84, Bli94a, Bli94b] and blue-screen matting [SB96]. In the context of inverse texture mapping, images having cutouts due to occlusions, moving objects, and irregular shapes from perspective projection, must often be blended. Alpha-masking the images provides a way to handle such images with cutouts. Specifically, the pixels in the cut-out regions of the images are given zero alpha values. However, as eluded to above, current blending methods are not capable of blending alpha masked images.
In addition, when images are taken from different view points or at different scales (i.e., zoom), they will have very different spatial samplings when projected into the texture map, and therefore contribute content at different spatial frequencies. Images to be blended may also have different exposures. Before such images can be blended, a decision must be made as to how much each image (in fact, each pixel in each image) should contribute to the final, blended texture map.
FIG. 1 illustrates the two foregoing problems in inverse texture mapping. Assume it is desired to create a texture map 10 of the side wall 12 of the building 14 using two images 16, 18. However, the view of the side wall 12 captured in the first image 16 is partially occluded by a tree 20. Similarly, the view of the side wall 12 capture in the second image 18 is partially occluded by another tree 22. In order to obtain a texture map of the side wall 12, the part of the two images 16, 18 depicting the trees must be xe2x80x9ccut-outxe2x80x9d prior to blending. As discussed above, this can be accomplished using an alpha mask and setting those pixels associated with the trees to zero. One of the problems addressed by the present invention is how to blend images with these cut-out regions (i.e., with alpha masks). Another problem addressed by the present invention is how to blend images having different sampling rates. This is also illustrated in FIG. 1. As can be seen the view of the side wall 12 captured in the first image 16 is directly in front of the wall, whereas the view of the side wall captured in the second image 18 is off to the side of the wall. This means that the first image will provide a better sampling rate of the pixels associated with the side wall 12 than will the second image in the context of image viewpoint. However, it can also be seen that the second image 18 was captured at a greater zoom setting than the first, and so may show more detail (i.e., a higher sampling rate) of the side wall 12 in some areas. Thus, the resulting blended image would be more accurate if the contributions of corresponding pixels in the two images 16, 18 where weighted relative to their sampling rates (and degree of exposure as well) prior to blending. The issue of how to adjust the blending coefficients to account for these differences is addressed by the present invention so that optimal sampling and exposure can be maintained in the blended image.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by an alphanumeric designator contained within a pair of brackets. For example,. such a reference may be identified by reciting, xe2x80x9creference [DTM96]xe2x80x9d or simply xe2x80x9c[DTM96]xe2x80x9d. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [PD84, Bli94a, Bli94b]. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
Given a 3D model and several images from different viewpoints, an xe2x80x9coptimalxe2x80x9d texture map can be extracted for each planar surface in the 3D model using an inverse texture mapping process. In particular, a unique weighted pyramid feathering process is employed that extends the traditional Laplacian pyramid blending algorithm to handle alpha masked images and take advantage of weight maps. In this way, it is possible to blend images having cut-out regions, such as those caused by occlusions or moving objects in the images. In addition, the use of a weight map associated with each input image to indicate how much each pixel should contribute to the final, blended texture map, makes it possible to create seamless texture maps.
Specifically, the present invention is embodied in a system and process for inverse texture mapping that uses the aforementioned unique weighted pyramid feathering scheme. First, at least two images of a 3D scene are inputted and any pixel located in a cut-out region of the images is set to a zero value. The 3D scene is characterized as a collection of regions (e.g., triangles). All the regions of the 3D scene which are to be texture mapped are identified in each inputted image. A 2D perspective transform for each identified planar region is then computed. This transform is capable of projecting the associated planar region to prescribed texture map coordinates. The 2D perspective transforms are used to warp each identified planar region to the prescribed texture map coordinates to create plurality of warped images. A weight map is created for each warped image. As indicated above, the weight maps specify the degree to which each pixel in a particular warped image is to contribute to the final, blended image.
The next step in the overall inverse texture mapping process is to blend the warped images using the weight maps. This generally entails forming an alpha premultiplied image from each warped image and constructing a band-pass Laplacian pyramid from the alpha premultiplied images. In addition, a low-pass Gaussian pyramid is constructed from each of the previously created weight maps. A new composite band-pass Laplacian pyramid is then computed using the band-pass Laplacian pyramids and low-pass Gaussian pyramids associated with the warped images. This computation includes, for each corresponding level of the pyramids, first multiplying each pixel of each warped image by the weight factor associated with that same pixel location in the corresponding weight map to produce a weighted pixel value. The weighted pixel values for each corresponding pixel location in all the warped images are respectively added together and divided by the sum of the weight factors corresponding with that pixel location. Lastly, the blended image is produced by expanding and summing the pyramid levels of the composite band-pass Laplacian pyramid, and compensating for the previously implemented alpha premultiplication of the warped images.
In regards to the alpha premultiplied, this process involves multiplying the R, G and B components of each warped image pixel by the alpha value associated with that pixel. Thus, the preferred alpha premultiplication compensation step involves dividing the R, G, B components of each pixel of the blended image by its alpha value.
It is also preferred that measures be taken to determine whether the warped images are in registration with one another prior to the blending process. If the images are mis-registered, the blended image quality can suffer. If any unacceptable mis-registration is discovered, the warped images are aligned via conventional methods [SS97, SS98].
In addition, it is preferred that, just prior to compensating for the alpha premultiplication in the blended image, the R, G, B and alpha components of each pixel be clipped by first setting any of these components having a value of less than zero to zero, and secondly setting any of the R, G, and B components having a value greater than alpha to the alpha value. This will compensate for any underflow or overflow caused by the pyramid blending process. It is also desirable that a step be added to the general blending process described previously to ensure any cut-out region common to all the input images maintains a zero alpha value. This is accomplished via the use a logical OR operator following the alpha premultiplication compensation step. Specifically, each pixel of the blended image is compared to a correspondingly located pixel of each of the alpha premultiplied images. Whenever any compared pixel of the blended image coincides with a zero-valued pixel in all of the alpha premultiplied images, it is set to zero no matter what value it obtained during the blending process. Conversely, whenever any compared pixel of the blended image does not coincide with a zero-valued pixel in all of the alpha premultiplied images, it is left unchanged.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.