The present invention is related to methods and apparatus for rendering images, and more particularly to methods and apparatus for rendering images using 3D warping techniques.
An important task in the field of computer graphics involves the creating (or rendering) of new images of a 3D scene using reference information that describes the scene. Rendering methods are generally classified into two groups: geometry based rendering (GBR), and image based rendering (IBR). Using GBR, 3D images are generated by projecting scene information as if being projected from various view positions. The projected scene information may include parameters such as geometric modeling data, surface properties, and lighting parameters of the scene.
Nearly all conventional computer graphics systems use some form of GBR to render 3D images. Using GBR, the data defining the 3D objects that comprise a particular scene are explicitly included in the graphics system, making it a relatively simple task for the graphics system to manipulate the scene objects. As such, GBR-based systems excel at data-manipulative tasks such as moving the desired viewpoint of a 3D scene, and simulating a phenomenon known as collision detection. GBR-based systems, however, have a limited ability to represent complex shaped objects, or objects that include micro-structure. As a result, it is difficult to construct a photo realistic virtual environment using GBR.
IBR provides a solution to this problem. Rather than defining a scene using geometric modeling data, IBR-based systems use actual scene images (or reference images), taken at various viewing positions, to render the desired 3D image. A typical IBR process flow is shown in FIG. 1. Novel images (i.e., images different than the original reference images) are generated by first warping, or transforming the reference image data to the novel image space. The warped reference images are then blended together to render the desired 3D scene. It is unpractical to consider all the samples of the different reference images to render the novel image. Instead, a subset of samples must be determined that will suffice for rendering an image of sufficient quality. Ideally, the number of samples in such a set should depend only on the number of pixels in the desired image and not on the overall scene complexity.
IBR offers several advantages over GBR. First, IBR avoids the often tedious and time consuming task of modeling (or sampling) an object to form the modeling database. IBR-based systems are instead capable of directly accepting the captured image data of the various reference images into an image database. Second, the complexity of the IBR algorithms are generally independent of the complexity of the scene, allowing the viewpoint of complex 3D scenes to be changed interactively in real-time.
In addition to the reference image data, different approaches to rendering images using IBR may require additional information in order to adequately render the desired 3D scene. This additional information may include depth maps, viewing parameters, and correspondence information that interrelate the various reference image data. Typically, at least the additional depth information is needed for the warping process to produce acceptable results.
Although image-based rendering of 3D images by warping with depth information (IBRW) promises to produce images of much greater quality than GBR, until now, the only IBRW method that has approached this goal has been the so-called polygonal mesh method. Using the mesh method, reference images are first partitioned into a mesh of micro-triangles. After partitioning, the mesh is transformed (or warped) into a new image having the desired viewing position. The warped mesh is then fed to a polygon-rendering engine that renders the desired 3D image.
FIGS. 2A through 2D illustrate the rendering of images using the mesh method. First, as shown in FIG. 2A, the reference image samples are warped into the desired image space. As the samples are warped to the desired image space, they move apart from one other leaving xe2x80x9cgapsxe2x80x9d of image information that must be filled. To fill in this information, the four neighboring samples are connected with two triangles as shown in FIG. 2B. Once connected, the triangles are rasterized to create sub-samples between the warped image samples as indicated by the shading in FIG. 2C. As other samples are warped and the corresponding triangles rasterized, the continuity of the surface must be maintained. This is illustrated in FIG. 2D. On average, a xe2x80x9ccostxe2x80x9d of two triangles per sample may be assigned to the rendering method.
Using mesh IBRW systems, a high degree of image quality may be achieved through minute scan-conversion of the micro-triangles that comprise the mesh. Unfortunately, not all types of images may be rendered in this manner. For example, polygon-rendering produces unacceptable results when attempting to render multiple reference images, each having image data at redundant locations. Polygon-rendering of these multiple reference images causes the corresponding triangles at the redundant locations to interpenetrate and coincide. This, in turn, causes a flashing (or flickering) to occur in the image at the redundant locations as the image viewpoint is changed.
One solution to address the flashing problem is to pre-process the image data in order to build a single mesh. This eliminates any redundant triangles in the final mesh. The pre-processing, however, is not only difficult to perform, but can often be extremely time-consuming. The added delay needed to pre-process the mesh data can inhibit the ability to warp the image data and render novel images in real-time. In addition to the flashing problem, the setup up costs associated with the polygonal mesh approach using traditional polygonal rasterization limits the performance of mesh-based image rendering hardware.
FIGS. 3A through 3F illustrate the steps involved in performing the traditional triangle rasterization process. The process begins by defining locations in the image plane where parameters for rasterization are to be evaluated. Typically, these locations are defined to be the pixel centers for the various rasterization triangles as shown in FIG. 3A. In order to determine the parameter values at these particular image plane locations, a backward mapping from the desired-image plane to the surface modeled by the triangle as shown in FIG. 3B is needed. This backward mapping must be computed at setup, and can be quite time consuming and expensive, in terms of required computational power.
The time-consuming computations required to compute the backward mapping are illustrated in FIGS. 3C through 3F. The object of computing the mapping is to determine the corresponding parameter plane for each desired parameter. In the exemplary illustration shown in FIG. 3C, the desired parameter is z. The first step in calculation process is to compute the plane normal as the cross-product of the two difference vectors, P2xe2x88x92P1 and P3xe2x88x92P1. This computation is shown in FIG. 3D. The computed normal forms the plane equation nax+nby+ncz+D=0, shown in FIG. 3E, which is then used during rasterization to evaluate the parameter at the various pixel centers as shown in FIG. 3F.
As an example of the number computations required to perform the rasterization, assume that it is desired to render an image having a targeted resolution of 1280xc3x971024 pixels. On average, samples will be warped twice at the desired resolution. Also, recall that polygon rendering requires on average that two mesh triangles be rendered for every warped sample. Thus, the average number of triangles, N, that must be rendered per second in order to sustain a frame rate of 30 Hz is:
N≈1280xc3x971024xc3x972xc3x972xc3x9730≈157 M triangles/sec
Conventional graphics hardware is incapable of achieving this level of computational performance. Indeed, it is believed that it will be years before such sustained levels of graphics performance are achievable. Moreover, even when such levels of performance are achieved, rendering images in this fashion will still require more hardware than rendering images on a machine optimized for IBRW.
Yet another drawback of conventional IBR techniques is the number of reference image samples required for each frame. The number of reference images depends on the contents of the scene and on how the scene is modeled. On average, more than one reference image sample must be processed per desired image location. This is true because there are often surfaces that are redundantly captured in more than one reference image. In addition, there often exist surfaces captured in the reference images that are not visible in the desired image (i.e, the depth complexity of the image is greater than one). Also, there may be surfaces that are better sampled in the reference image than in the desired image that can lead to more than one visible sample per desired image pixel.
The existence of each of the above-described image conditions makes two input samples per output pixel a reasonable lower bound. Practically it would be difficult to use fewer than two samples. When using real-time depth-image updating (often referred to as xe2x80x9cimmediate modexe2x80x9d), the number of samples will be determined by the number, resolution, and update rates of the cameras used to capture the reference images.
At present, the most viable alternative to IBRW is the above-mentioned technique of simplifying the triangle meshes in order to reduce the polygon count. This technique promises to achieve the desired performance goals using conventional graphics hardware. Again, however, the amount of pre-processing required to obtain the simplified meshes make the technique not well suited to real-time depth updating.
Another commonly used IBRW method, different from the polygonal mesh method and popular in the area of volume rendering, is a technique known as splatting. With IBRW splatting, areas of a desired image influenced by warping are approximated into xe2x80x9csplatsxe2x80x9d based upon the opacity and color of the warped pixels. Using volume rendering, the splats are blended in a front-to-back order. For IBRW, however, samples existing at varying depths must not be blended together. Instead, samples having greater depths should be overwritten by overlapping samples having lower depth values. Only those samples that exist on the same surface should be blended together. This can be a difficult task to achieve, as no information about the depths of the various surfaces exists in the IBRW database.
Although rendering images by splatting is faster than rendering images using the mesh method, the quality of the warped images produced by splatting quickly degrades as the viewpoint moves away from the reference image views. As a result, splatting is typically used only when polygon-rendering hardware support is unavailable, or when such hardware support is uneconomical to use.
To better understand the quality limitations associated with rendering images by splatting, it will be helpful to first discuss the two main tasks performed by conventional splatting algorithms: (1) resolving visibility, and (2) reconstruction.
The first task of resolving visibility involves identifying those warped samples that are visible in the new image space, and eliminating those warped samples that should be invisible in the new space. Invisible samples may, for example, belong to surfaces that are behind other opaque surfaces in the image. Consider, for example, a reference image that depicts a house where both the front side and the left side of the house are visible. If the viewpoint of the new image is directed at the right side of the house, the left side of the house should no longer be visible. Thus, resolving visibility in the above example would involve first categorizing the warped samples belonging to the left side of the house as being invisible, and then subsequently discarding these invisible samples from the final image database.
The second major splatting task of reconstruction involves first computing a color value for the centers of each of the pixels of those visible warped samples that comprise the new image space. Once computed, the color values are then used to blend the visible warped samples together to render the desired image. This second task may be referred to as xe2x80x9creconstruction/re-samplingxe2x80x9d, since the task directly re-samples the color values of the visible warped samples (at the center of the pixels) to blend the samples together, without having to form an intermediate continuous representation of the image.
Conventional splatting algorithms carry out the tasks of resolving visibility and reconstruction simultaneously. Applicants have observed that this approach leads to inefficiencies in the splatting process that impact the quality of the rendered images. For example, in order to properly resolve visibility, splats must be opaque so that top-surface samples completely overwrite back-surface samples when composited on one another. During reconstruction, however, the splats must be semi-transparent in order to properly blend the visible samples together to form the desired image.
As a consequence, the blending of the samples should not be started until after visibility in the new image space has been fully resolved. Blending the samples before completely resolving visibility would xe2x80x9ccontaminatexe2x80x9d visible samples with information from back-surface samples that should not contribute any information to the final image.
It has also been observed that any underestimation in the size of the splat may allow back-surface samples to erroneously appear in the final image. To prevent this phenomenon from occurring, conventional algorithms ensure that splat sizes are overestimated (i.e., areas known not to be influenced by warping are nevertheless included in the defined splats.) This approach, however, adversely affects the reconstruction of the desired image, causing what should be visible samples to be incorrectly erased in the final image. This, in turn, leads to an undesirable aliasing of edges in the final image, and to high frequency textures being included in the final rendered image.
Thus, rendering images by simultaneously performing the tasks of resolving visibility and reconstruction causes inefficiencies in the rendering process that impact the quality of the rendered images. It is therefore an object of the present invention to provide methods and apparatus for separating these two tasks by providing a IBRW technique that produces high-quality images without the need for powerful, but costly, polygon rendering hardware.
This and other concerns are solved by a method of rendering images, the method including the step of segmenting at least one reference image in a reference image space into a plurality of tiles, each tile defined by a corresponding set of image samples. The connectivity of each of the samples is determined based on the relative curvature of a surface of the corresponding tile at the sample coordinates. Color and depth information between adjacent, connected samples is bi-linearly interpolated to form a corresponding set of sub-samples. The set of sub-samples are warped from the reference image space to a destination image space. A final pixel color for each of a plurality of groups of sub-samples in the destination image space is computed as a weighted average of the color information of the sub-samples for each respective group. The final pixel colors are combined to render a final image in the destination image space from the at least one reference image.
According to another aspect of the invention, the set of sub-samples in the destination image space is Z-buffered prior to computing a final pixel color.
According to another aspect of the invention, the relative curvature of the corresponding segment at the sample coordinates is calculated by computing the second derivative of a generalized disparity of the sample defined as the ratio of the distance between a viewing position of the at least one reference image and the plane of the at least one reference image and the distance perpendicular to the plane of the at least one reference image between the viewing position and the sample.
According to another aspect of the invention, the second derivative of the generalized disparity of the sample is computed along four directions in the plane of the at least one reference image at the sample coordinates, the four directions including an E-W, a SE-NW, a N-S, and a SW-NE direction.
According to another aspect of the invention, when the computed second derivative of the generalized disparity of the sample exceeds a predetermined threshold, the sample is not connected to form the final image in the destination image space.
According to another aspect of the invention, the step of bi-linearly interpolating between adjacent, connected samples includes the step of selecting a connected sample. A linear segment between the connected sample and an adjacent, connected sample is divided to obtain at least one sub-sample. Linear segments between connected samples and adjacent connected samples, between connected samples and adjacent sub-samples, and between sub-samples and adjacent sub-samples are repeatedly divided to obtain sub-samples until a desired interpolation distance between sub-samples is achieved. Sub-samples are obtained at coordinates at or near the connected samples.
According to another aspect of the invention, each respective tile includes a plurality of quads, each quad defined by four adjacent, connected image samples.
According to another aspect of the invention, the set of sub-samples are warped from the reference image space to the destination image space into a warp buffer having at least two sub-sample locations for each of the two directions defining the destination image plane for each final image pixel.
According to another aspect of the invention, each of the warped sub-samples is stored in a respective location of the warp buffer with a pair of offset values of at least two bits each to further define the location of a sub-sample within the warp buffer location.
According to another aspect of the invention, sub-samples having offset values defining locations closer to the center of a final image pixel are assigned weights that provide a greater relative contribution of the sub-sample""s color information to the final computed pixel color.
According to another aspect of the invention, each of the plurality of groups of sub-samples includes sub-samples located in the warp buffer within at least one pixel of a respective final image pixel.
According to another aspect of the invention, the weights used to define the relative contribution of the color information are defined by a function having a relatively smaller value at the comer of a two-by-two pixel neighborhood centered at a respective final image pixel, and increasing in value to reach a maximum value at the center of the respective final image pixel.
According to another aspect of the invention, the function is a raised cosine function having zeroes placed at the corner of a two-by-two pixel neighborhood.
It should be emphasized that the terms xe2x80x9ccomprisesxe2x80x9d and xe2x80x9ccomprisingxe2x80x9d, when used in this specification as well as the claims, are taken to specify the presence of stated features, steps or components; but the use of these terms does not preclude the presence or addition of one or more other features, steps, components or groups thereof.