1. Technical Field
The invention is related to texture map construction, and more particularly to a system and process for generating an optimal texture map of a scene from a plurality of textures each of which is reconstructed from multiple input textures representing the same portion of the scene and taken from images of the scene captured from different perspectives.
2. Background Art
Texture mapping is an established rendering technique used to enhance the realism of 3D models. In computer vision, 3D models are typically constructed using multiple images (and possibly range data). Their textures are also recovered using combinations of appropriately extracted parts of the source images. Currently, textures or images are manipulated (e.g., warped) using techniques that are simplistic approximations of the true mapping function, which results in suboptimal appearances in the recovered textures. These approximations are used primarily because of their simplicity in implementation or limitations of hardware.
Thus, an important issue is how these textures can be extracted as accurately as possible from multiple views. Assuming that all surfaces are Lambertian, a final texture is typically computed as a linear combination of the reference textures. This is, however, not the optimal means for reconstructing textures, since this does not model the anisotropy in the texture projection. Furthermore, the spatial image sampling may be quite variable within a foreshortened surface.
Generating an optimal texture map not only has implications on improving realism of the recovered 3D model. It can also apply to computer vision techniques that rely on analysis by synthesis. Such computer vision techniques reconstruct intermediate appearances for comparison with input images in order to refine the desired output. A typical example is the direct recovery of 3D geometry and texture from multiple reference images [6]. In another, Morris and Kanade [13] find the best triangulation for a given set of feature point correspondences across multiple images. The metric used is the reprojection error for a given hypothesized triangulation. Generation of correct textures is critical for such techniques.
There has also been a significant amount of work done on generating an image with a resolution higher than its individual sources, i.e., super-resolution. This can also be considered as recovering an optimal texture map from multiple (smaller resolution) texture maps seen at different views. Thus, generation of accurate textures is critical for these techniques as well. Current super-resolution approaches can be categorized as being interpolation-based [8, 17, 9], frequency-based [18, 10, 11], or reprojection-based [2, 16]. While producing acceptable results, the introduction of even more accurate and efficiently computed texture maps would be welcomed.
The generation of optimal textures is also critical for the increasingly popular image-based rendering technique (IBR) of view-dependent texture mapping (VDTM) [3]. There is typically photometric variation across the views used to construct textures due to lighting changes and non-Lambertian surfaces. View-dependent texture mapping has been proposed as an image-based means of modeling photometric variation, thus enhancing realism [3]. For a given view, reference textures are typically blended based on viewpoint proximity to the corresponding reference views (in the form of a sphere view map). Others that use the sphere view map as well include [4, 14, 15]. In the xe2x80x9cUnstructured Lumigraphxe2x80x9d work [1], global weights for each face texture are computed based on ray angular difference, estimates of undersampling, and field of view. Here again, methods for producing accurate and efficiently computed texture maps could be quite useful.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [2, 3]. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present invention is directed toward a system and process for reconstructing optimal texture maps from multiple views of a scene that can be incorporated with great advantage in the aforementioned texturing of 3D models, analysis by synthesis methods, super-resolution techniques, and view-dependent texture mapping. In essence, the present system and process is based on the optimal synthesis of textures from multiple sources. This is generally accomplished using basic image processing theory to derive the correct weights for blending the multiple views. Namely, the steps of reconstructing, warping, prefiltering, and resampling are followed in order to warp reference textures to the desired location, and to compute spatially-variant weights for optimal blending. These weights take into consideration the anisotropy in the texture projection and changes in sampling frequency due to foreshortening. The weights are combined and the computation of the optimal texture is treated as a restoration problem, which involves solving a linear system of equations.
More specifically, the present texture map reconstruction system and process generates each final texture used in defining the map from a plurality of textures, each of which is reconstructed from multiple input textures. The input textures used to form a particular final texture of the texture map all represent the same portion of a scene being modeled, except that each was taken from images of the scene captured from different perspectives. The procedure employed to reconstruct a final, combined texture from multiple input textures generally involves first resampling the multiple textures, and then computing a weight matrix for each input texture from its corresponding resampled texture. This weight matrix is made up of spatially-variant weights which when applied to the associated input texture produces the resampled texture. Once the weight matrices have been computed, the input textures are blended. This part of the procedure entails using the weight matrices to produce said single combined texture. Essentially, this involves combining the weight matrices and then determining what single texture of a desired size produces the individual input textures when the combined weight matrix is applied to it. This single texture is then designated as the aforementioned final combined texture.
In regard to the resampling of the multiple input or reference textures, it is noted that each texture is a digitized texture defined by a discrete signal having a measured pixel value for each pixel location of the texture. The resampling generally involves first reconstructing a continuous signal from the digitized input texture, and then warping the reconstructed continuous signal to a prescribed location. The warped signal is then prefiltered to remove any frequencies that are higher than a prescribed maximum frequency. This eliminates aliasing. Finally, the warped and filtered signal is sampled to produce a discrete output signal that represents the resampled texture. In one version of the resampling procedure the aforementioned reconstructing, warping, prefiltering and sampling involves computing a resampling filter that relates pixel locations in the resampled texture to pixel locations of the input texture. The resampling filter is used to specify the weight that is assigned to a pixel location of the input texture for each pixel location of the resampled texture. Thus, the resampling filter is used to generate the aforementioned weight matrix for the input texture.
In regard to the blending procedure, one version of the present texture map reconstruction system and process accomplishes this task as follows. Each final combined (i.e., reconstructed texture) is characterized as a matrix of pixels X of size Mxc3x97N. For P input textures             {              Y        k            }              k      =      1        P    ,
each of which is characterized as a matrix of pixels of size Mkxc3x97Nk, and their associated weight matrices             {              W        k            }              k      =      1        P    ,
which are each characterized as matrix having a size of (MN)xc3x97(MkNk), a combined input texture matrix is generated by stacking the columns of pixels of the input texture into a single column and then stacking the resulting individual input texture matrices in a prescribed order. A combined weight matrix is also generated by stacking the individual weight matrices in the same order as the input texture matrices are stacked. Next, a system of equations is defined by setting the combined input texture matrix equal to the combined weight matrix multiplied by a column matrix representing the final combined texture X with its pixel columns stacked to form a single column. Thus,       [                                        Y            1                                                ⋮                                                  Y            P                                ]    =            [                                                  W              1                                                            ⋮                                                              W              P                                          ]        ⁢          X      .      
The system of equations is solved to define the final combined texture matrix X. The actual texture can then be recovered by un-stacking the pixel columns.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.