Three-dimensional modeling, which is now a subset of the larger field of computer graphics, has become popular in a number of areas, for example, as applied in computer-aided design of architecture, industrial design and construction. As has been recognized in these and other fields, it is often desirable to have a three-dimensional model, complete with a description of shape, location, orientation and material surface properties (i.e., texture), in order to produce realistic renderings on a computer which can be used to document a new design of a city, a building or an object. The model can also be used for computer animations, virtual reality immersion of users in a scene or for manufacturing tasks.
Producing an image of a three-dimensional scene requires finding the projection of that scene and re-projecting it onto a two-dimensional screen. If the image of the three-dimensional scene is to appear realistic, then the projection from three to two dimensions must be a perspective projection. In the case of a scene which includes texture mapped surfaces, this involves not only determining where the projected points of the surfaces should appear on the screen, but also which portions of the texture image should be associated with the projected points. The process of mapping surface textures onto a synthetically rendered three-dimensional object is one which is well known in the computer graphics art. See, e.g., Foley et al., Computer Graphics Principles and Practices, Second Edition, .sctn. 17.4, 1990. Such texture mapping allows objects rendered by the computer system to appear realistic, without the need for highly detailed geometric modeling. Typically, a complex scene is converted to polygons before projection, and modem graphics systems, both hardware and software, are capable of rendering rectangular textures onto a variety of geometric primitives including polygons, spheres, boxes, etc.
Some practitioners have proposed using view-dependent textures. In such schemes, a three-dimensional model is rendered from a particular point of view and a decision is made as to which of several available source textures will be used for the rendering. Unfortunately, such schemes rely on the use of individual textures taken from images of different points of view and these individual images may have large regions which are "wrong" or "obscured" (depending on the particular projection of the model of the scene being displayed). Thus, the schemes resort to some kind of hole-filling technique which often results in visual artifacts. Moreover, because the multiple textures are unrelated, the final texture on a surface during an animation can appear to "pop" when different source textures are switched on. For example, a tree may suddenly appear and/or disappear or an entire surface may change in hue or luminosity. This popping effect can be reduced somewhat by blending rather than just switching textures, however, real time blending is not supported by many computer graphics systems. Preparing the blended versions in a preprocessing step is impractical because of the many possible viewing angles and thus many possible blended versions. The memory requirements for such a scheme would be large and, effectively, cost prohibitive. Further, sending down new textures (i.e., new blended textures) for every few frames of the animation would likely overload the graphics pipeline of the computer system. See, e.g., Eric Chen, "View Interpolation for Image Synthesis", Computer Graphics, Proceedings, SIGGRAPH 1993, pp. 279-288 (1993); Marc Levoy and Pat Hanrahan, "Light Field Rendering", Computer Graphics, Proceedings, SIGGRAPH 1996, pp. 31-42 (1996); and Steven Gortler et al., "The Lumigraph", Computer Graphics, Proceedings, SIGGRAPH 1996, pp. 43-54 (1996).
A recently published method (Paul Debevec et al., "Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach", University of California Berkeley Technical Report UCB-CSD-96-893, January 1996) proposes a somewhat different scheme. The method requires a user to first create a parameterized (or rough) model of the objects in the scene using a separate editor. Second, the user draws edges on top of one or more photographs. Third, the user marks each edge in each photograph as corresponding to a particular edge in the parameterized model. The method then calculates values for the parameters in the model. For the texturing, dense depth information is extracted for a surface and that depth is used to render a 21/2-dimensional version of the surface. This requires a preprocessing step for extracting the depth information but can result in more realistic renderings than are available using the above methods. This work is based in part on concepts and mathematics from Camillo Taylor and David Kriegman of Yale University, as reported in "Structure and Motion from Line Segments in Multiple Images", Yale University Technical Report#94026, January 1994. However, because the depth information is used inside the texture/surface rendering loop of the modeling algorithm, the method cannot be used with standard graphics systems which generally do not allow a user to modify these inner loops. This limits the wide applicability of the texture plus depth rendering process.
In light of the need for computer-generated three-dimensional models, but given the shortcoming of prior schemes for texture mapping, it would be desirable to have an improved computer-assisted technique for the creation and utilization of merged, view-independent textures from multiple views for application in interactive, computer-assisted three-dimensional modeling routines.