The present invention relates to methods and systems for producing three-dimensional images of a scene. More particularly, the present invention relates to modeling a scene using relief textures, pre-warping the relief textures into two-dimensional textures that are perspectively correct from a desired viewpoint, and texture mapping the two-dimensional textures to polygons used to model the scene.
In computer graphics, conventional textures are two-dimensional patterns or arrays of pixels used to add detail to computer-generated images. In other words, each pixel in a conventional texture has a row coordinate in texture space, a column coordinate in texture space, and a color value. Texture mapping is the process of mapping two-dimensional textures onto polygons used to model objects displayed on a computer screen. For example, the front of a building may be represented by a rectangle in computer graphics. A texture for the front of the building may include the windows. Texture mapping is utilized to map the pixels in the texture from texture space to screen space. The most common form of texture mapping is inverse mapping. In inverse mapping, the color of each pixel in the rectangle is determined projecting the footprint of each pixel in screen space to texture space and integrating the pixel colors in the texture that fall within the footprint.
One problem with conventional texture mapping is that images created using conventional methods appear flat when viewed from different viewpoints. FIG. 1 illustrates computer images of a photograph viewed from different viewpoints. In FIG. 1, a first image 100 is a photograph shown from a viewpoint orthogonal to the plane of the page. A second image 106 is the same photograph shown from a viewpoint below the viewpoint of the image 100. In images 104 and 102, the photograph is shown from viewpoints shifted to the left and right of the viewpoint of the image 100. Because the photograph is flat, and the corresponding image is flat, occluded objects in the photograph cannot be seen when the viewpoint changes. For example, additional features of the person in the photograph do not become visible when the viewpoint changes. If, however, the actual scene is viewed, different things could be seen from the different viewpoints. For example, if the actual scene were viewed from the viewpoints of the images 104 and 102, left and right side views of the profiles of the person in the scene would be visible.
In computer graphics, the images of the photograph in FIG. 1 are the same as conventional textures, i.e., each image is a set of pixels, each having a row coordinate, a column coordinate, and a color value. The mapping of the images in FIG. 1 onto different viewing planes illustrates the limitations of conventional texture mapping. Thus, like the images of the photograph in FIG. 1, surfaces represented by two-dimensional textures appear flat when the viewpoint changes. FIG. 2 is an example of a scene represented with three polygons using conventional texture mapping. The red lines represent the borders of the polygons. Textures 200, 202, and 204 are mapped to the polygons and represent faces of buildings from the original scene. Because the textures do not convey depth information with regard to the actual surfaces they represent, the reproduction of the scene appears flat.
FIG. 3 illustrates conventional texture mapping in the one-dimensional domain. In FIG. 3, a first image 300 may represent a horizontal section through a two-dimensional texture. A second image 302 is a projection of the image 300 onto a projection plane 304. A third image 306 is a projection of the image 300 onto a projection plane 308. A fourth image 310 is a projection of the first image onto a projection plane 312. Because of the viewing angles between the original image and the projection planes 308 and 312, non-uniform contraction of the original image may occur. For example, in FIG. 3, the original image includes a red area 314, a green area 316, and a blue area 318. In the image 310, the red area 314 maps to red area 320. The green area 316 maps to green area 322, and the blue area 318 maps to blue area 324. Because of the viewing angle, the contraction of the red area between images 300 and 310 is greater than the contraction of the blue area. Despite the non-uniform contraction of the projected image, there is a one-to-one mapping between points of the original image 300 and points of the projected image 310. Because the mapping from original to projected images is one-to-one, the mapping can be easily inverted, i.e., given coordinates on the projected image, the computation of the corresponding coordinates on the original image is straightforward. When the projection plane is a computer screen and the original image is a texture image, this mapping is referred to as texture mapping. When pixels are mapped from the texture coordinates to the screen coordinates, the mapping is referred to as a forward mapping. When pixels are mapped from screen coordinates back to texture coordinates, the mapping is referred to as an inverse mapping. Because there is a one-to-one correspondence between pixels in the original and projected images in texture mapping, inverse mapping requires only simple calculations. The simple nature of the inverse formulation of texture mapping is known, presents several filtering advantages over the forward mapping, and is a standard operation in most computer graphics hardware.
Three-dimensional image warping is a mapping from a sampled three-dimensional model of a scene to a two-dimensional image from a given viewpoint. Texture mapping is a special case of three-dimensional image warping for which the input image is planar, as in the examples illustrated in FIGS. 1 and 2. Thus, for conventional two-dimensional images, both techniques produce exactly the same results. The difference between texture mapping and three-dimensional image warping is primarily in the type of input images, rather than the process, as will be explained in more detail below. Because conventional texture mapping only handles planar images, its equations are simpler than those used for three-dimensional image warping. Equations (1)-(4) shown below illustrate the relationship between texture mapping and three-dimensional image warping. Equations (1) and (2) define forward texture mapping; whereas, equations (3) and (4) define forward three-dimensional image warping. Each of the equations express how elements (pixels) of the input image represented by the coordinates (u1, v1) are mapped to elements of the projected image represented by the coordinates (u2, v2). In Equations (3) and (4), displ(u1, v1) represents the height of each pixel in an image measured from a basis plane of the image. If displ(u1, v1) is constant for all elements of the input image (i.e., the image is planar), equations (3) and (4) reduce to an instance of equations (1) and (2), respectively.                               u          2                =                                            Au              1                        +                          Bv              1                        +            C                                              Iu              1                        +                          Jv              1                        +            K                                              (        1        )                                          v          2                =                                            Eu              1                        +                          Fv              1                        +            G                                              Iu              1                        +                          Jv              1                        +            K                                              (        2        )                                          u          2                =                                            Au              1                        +                          Bv              1                        +            C            +                          Ddispl              ⁡                              (                                                      u                    1                                    ,                                      v                    1                                                  )                                                                        Iu              1                        +                          Jv              1                        +            K            +                          Ldispl              ⁡                              (                                                      u                    1                                    ,                                      v                    1                                                  )                                                                        (        3        )                                          v          2                =                                            Eu              1                        +                          Fv              1                        +            G            +                          Hdispl              ⁡                              (                                                      u                    1                                    ,                                      v                    1                                                  )                                                                        Iu              1                        +                          Jv              1                        +            K            +                          Ldispl              ⁡                              (                                                      u                    1                                    ,                                      v                    1                                                  )                                                                        (        4        )            
Images with depth are images in which each pixel has an associated depth value representing a distance between the sample and the center of projection of a real or imaginary camera used to define the image. Images can be spherical, cylindrical, etc., but such images are not within the type of images referred to herein as non-planar images or images with depth. Due to the two-dimensional nature of the film and paper used to acquire and print pictures, images are commonly thought of as two-dimensional entities. In fact, an image is a mapping from a two-dimensional support to a multidimensional space. Such space is usually a color space (multiple wavelengths), but the space may contain information in addition to color. For instance, the space used to represent an image may contain depth information representing the distance of objects in the scene from the camera center of projection on a per element basis. For a real scene, the depth information may be collected using a depth-measuring device, such as a laser range finder. Computer images in which each element or pixel has an associated depth value, in addition to a color value, are referred to as images with depth. In a computer, an image with depth may be represented by a multidimensional array of numbers. Since computers are capable of easily manipulating arrays of numbers, from a computer""s point-of-view, an image with depth is no different from a conventional image.
One reason that it may be desirable to represent scenes as images with depth inside of a computer is that samples of images with depth can be mapped back to three dimensions and then re-projected onto arbitrary view planes, thus obtaining new views of the same scene. For example, a camera may be used to obtain a two-dimensional image of a scene and a laser range finder may be used to obtain depth values for each element of the sample. The elements from the sample may then be mapped back to three-dimensional space based on the depth values and mapped onto arbitrary projection planes. This is the essence of three-dimensional image warping, although three-dimensional image warping may not explicitly require mapping of samples to three dimensions before projecting the scene onto arbitrary viewing planes. The mapping is performed implicitly by the three-dimensional image warping equations.
FIG. 4 illustrates the operations performed in three-dimensional image warping. In FIG. 4, line segments 400 represent the scene geometry. For example, the line segments 400 may represent a horizontal section through a vertical scene, such as the walls in a room. In the illustrated example, the line segments 400 representing the scene geometry include a red segment 402, an orange segment 404, a green segment 406, and a blue segment 408. A first image i1 is an image with depth taken from the center of projection C1. For example, the image i1 may comprise an array of pixels recorded by a camera and a depth value associated with each pixel representing the distance of the element in the scene to the center of projection C1. Images i2 and i3 were obtained by warping (re-projecting) the image i1 onto new image planes 412 and 414, respectively. Because the orange segment 404 is behind the green segment 406, when viewed from the center of projection C1, samples from the orange segment 404 do not appear in the image i1. As a result, when the image i1 is warped into the image i2, a hole 415 appears between a red area 416 and a green area 418 in the image i2. The problem of holes appearing in a warped image is a reconstruction problem, and some action is required to fill such holes. In the absence of additional information, an educated guess consists of filling gaps with interpolated colors between adjacent samples from the original image For example, in the illustrated example, the hole 415 in the image i2 should be filled with colors interpolated from red to green.
Another problem with three-dimensional image warping is that the one-to-one relationship between pixels in original and projected images may not hold. For example, in image i3, samples from multiple surfaces are mapped to the same pixel. More particularly, samples from the red segment 402 and the blue segment 408 along projected ray 420 map to the same pixel in the image i3. Because multiple pixels in the scene geometry 400 map to the same pixel in the image i3, the one-to-one relationship between points of scene surfaces and their projections onto an image plane does not hold. One conventional solution in such cases is to search for the closest intersection of a projected ray with the scene to determine which pixel to display. In the illustrated example, since the intersection between the projected ray 420 and the blue segment 408 is closer to the image plane 414 than the point where the projected ray intersects the red segment 402, a blue pixel is displayed in the image i3. Searching for the closest intersection along the projected ray is computationally expensive and, therefore unsuitable for interactive applications. In addition, because of the lack of one-to-one correspondence between pixels in the scene and pixels in the image plane, no inverse mapping is readily available. The lack of a convenient inverse mapping makes filtering difficult. However, forward three-dimensional image warping handles visibility among multiple surfaces. The main disadvantages of three-dimensional image warping are poor filtering and the appearance of holes.
In summary, three-dimensional image warping can re-project images with depth onto different image planes, thus producing new views of a scene. In one sense, three-dimensional image warping is the computer graphics analog of optical holograms: images change with viewpoint. Unfortunately, however, when a single image with depth is used, areas not visible in the original image may become exposed, producing holes in the final image. Also, because three-dimensional image warping is usually a many-to-one mapping, it does not have a cheap inverse formulation. Texture mapping has a simple inverse formulation but does not account for changes in occlusion relationships when views change.
Sprites with depth enhance the descriptive power of traditional sprites with out-of-plane displacements per pixel. Sprites with depth are rendered using a two-step algorithm to compute the color of pixels in the destination image from pixels in a source image. In the first step, the displacement map associated with the source image is forward mapped using a 2-D transformation to compute an intermediate displacement map d3(x3, y3), which is then stored for later use. In the second pass, each pixel (x2, y2) of the desired image is transformed by a homography (planar perspective projection) to compute intermediate coordinates (x3, y3). Such coordinates are used to index the displacement map d3(x3, y3) computed in the first pass. The retrieved displacement value is then multiplied by the epipole e21 and added to the result of the homography, producing the coordinates (x1, y1) in the source image. Such coordinates are used to compute the color of the destination pixel (x2, y2) by filtering the color of pixels in the neighborhood of (x1, y1) in the source image. Sprites with depth are described in a publication entitled xe2x80x9cLayered Depth Imagesxe2x80x9d by Shade et al, Proceedings of SIGGRAPH 1998 (Jul. 19-24 1998). However, this publication does not describe how the intermediate displacement map d3 is computed.
One variation of the algorithm described above for producing sprites with depth consists of, in the first step, forward mapping the displacement map associated with the source image to an intermediate image and for each pixel (x1, y1) from the source image and its image (x3, y3(in the intermediate image, the differences u3(x3, y3)=x3xe2x88x92x1 and v3(x3, y3)=y3xe2x88x92y1 are computed and stored for later use. During the second pass of the algorithm, each pixel (x2, y2) of the desired image is transformed by a homography to compute intermediate coordinates (x3, y3). Such coordinates are added to (u3(x3, y3), V3(x3, y3)) to produce the coordinates (x1, y1) in the source image, whose neighborhood is then filtered to produce the color for (x2, y2).
Although this approach is expected to produce smoother rendering than traditional forward mapping (splatting) techniques, the reconstruction is done using splats and holes may still occur.
In three-dimensional image production according to the present invention, each coordinate of the destination pixel depends only on its counterpart in the original pixel (i.e., u2 does not depend on v1, and v2 does not depend on u1). This enables three-dimensional image generation according to the present invention to be implemented efficiently as 1-D operations for both reconstruction and filtering. In addition, standard texture mapping hardware may be used to perform the final planar perspective warp. While sprites with depth should be used as rendering primitives only when viewed from a distance, the textures produced using the three-dimensional image generation techniques according to the present invention can be used even when the viewpoint is very near to the polygon, because all holes are completely filled during the reconstruction process.
A nailboard is a texture-mapped polygon augmented with a displacement value per texel specifying an amount by which its depth deviates from the depth of the represented view of an object. The idea behind nailboards is to take advantage of frame-to-frame coherence in smooth sequences. Thus, instead of rendering all frames from scratch, more complex objects are rendered to separate buffers and re-used as sprites as long as the geometric and photometric errors remain below a certain threshold. An error metric is therefore required. The displacement values associated with each texel are used to modulate the depth buffer of the final composite frame. In conjunction with partially transparent polygons, the associated displacements are used to solve visibility among other nailboards and conventional polygons. The depth values associated with nailboards are not utilized to perform image warping.
In light of the difficulties associated with conventional texture mapping, three-dimensional image warping, and sprites with depth there exists a need for improved methods and systems for producing three-dimensional images.
According to the present invention, a scene is modeled using one or more relief textures. A relief texture is a texture in which each pixel includes a height or displacement value representing the distance between a surface sample and its orthogonal projection onto the basis plane of the relief texture. Each pixel may also include a color value and a normal vector. The normal vector is normal to the surface point represented by the relief texture pixel. Relief textures are different from conventional two-dimensional textures because of the addition of the height associated with each pixel. In addition relief textures are different from conventional images with depth used in three-dimensional image warping because the projections from which the height values are measured are parallel, rather than perspective projections.
Relief textures combine the holographic nature of three-dimensional image warping with the ability to use an inverse mapping which is desirable for filtering purposes. This is accomplished by solving the visibility problem, i.e., transforming the many-to-one mapping into a one-to-one mapping, and then using conventional texture mapping to handle the final transformation.
The improved methods and systems for producing three-dimensional images result from factoring the three-dimensional image warping equations into a pre-warp followed by standard texture mapping. The pre-warp handles only the parallax effects resulting from the direction of view and the displacement of texture elements. The subsequent texture mapping operation handles scaling, rotation, and the remaining perspective transformation.
The pre-warp equations have a simple one-dimensional structure that enables the pre-warp to be implemented using only one-dimensional image operations along scan lines and columns. In addition, pre-warping requires interpolation between only two adjacent pixels at a time. This allows efficient implementation in software and should allow a simple and efficient hardware implementation. Texture-mapping hardware common in graphics systems may be used to efficiently implement the final texture mapping stage of the warp.
The present invention includes methods appropriate for implementation in hardware and software for using relief textures to add realistic surface detail and to render complex scenes and objects.
It is an object of the present invention to provide methods and systems for generating three-dimensional images of objects that include advantages of both texture mapping and three-dimensional image warping.
It is another object of the present invention to provide methods and systems for generating three-dimensional images with a convenient inverse formulation and the ability to display the proper occlusion relationships in an image when the viewpoint changes.
While some of the objects of the invention have been stated hereinabove, other objects will become evident as the description proceeds, when taken in connection with the accompanying drawings as best described hereinbelow.