1. Technical Field
The invention is related to synthesizing photo-realistic virtual images from actual images of an object, and more particularly to a system and process for efficiently representing an object to allow the synthesizing of photo-realistic images that depict both diffuse and specular reflections.
2. Background Art
Synthesizing photo-realistic virtual images from images of real objects is a major research topic in the computer vision and the computer graphics community. One avenue for producing these images is image-based rendering. Image-based rendering techniques use real 2D images of an object of interest as an input. Considering each pixel in the input images as samples of a plenoptic function, image-based methods synthesize virtual images by selecting the most appropriate sample of rays, or interpolating between the sampled rays. However, since these methods assume the object has a diffuse surface (rather than reflective), view-dependent variances such as specularity, that plays an important role in photo-realistic synthesis of images, is not taken into consideration. Essentially, reflective objects include both a diffuse reflection component which can be viewed as being constant at any particular point on the surface of the object of interest, and a specular reflection component that is dependent on the viewpoint from which the object is viewed. Both the reflectance parameters and illumination distribution of the environment surrounding object will dictate the amount of specular reflection that will be observed at any given viewpoint. It is this specular reflection component that is ignored in current image-based approaches. In addition, it is noted that since these image-based methods require only real images as the input, they provide high generality. In other words, they can be applied to a wide variety of objects and scenes. However, because of the principle of interpolation, these approaches tend to require a large number of input images. Although the light rays can be represented efficiently in lower dimensionality, and compression techniques such as vector quantization or MPEG-based approaches can drastically reduce the total amount of information to be stored, these methods still require a dense sampling of the real object which means taking hundreds of images.
Model-based methods or xe2x80x9cInverse Renderingxe2x80x9d is another major avenue of research in the area of synthesizing photo-realistic virtual images. Model-based methods use both 2D images and a 3D geometric model of the target object to estimate the BRDF of the object surface, either by fitting a particular reflection model to the pixel values observed in input images [1] or by solving the inverse radiosity problem [2]. However, in these methods, the radiance and positions of the light sources needs to be known, and direct information of lighting environment has to be provided in some way, e.g., with high dynamic range images of the light sources.
Recent research in the so-called xe2x80x9c3D photographyxe2x80x9d domain have proposed methods that go in between the image and model based approaches. By taking advantage of the latest advances in 3D sensing equipments, such as laser range scanners and structured light scanners, these approaches try to make full use of the 3D geometry as well as the images to synthesize images of an object that includes the specular reflection effects. For example, one such approach [3] in essence sets one of the 2D planes produced in a light field approach on to the object surface as represented by coarse triangular patches or dense surface points, respectively. By deriving information from the geometry in this way, these approaches succeed in achieving higher compression ratio without losing smooth view-dependent variation such as the movement of highlights. However, these methods still rely on very dense observation of the objects, and so require input of a large number (e.g., hundreds) of images of each object of interest.
The need for a large number of images of an object when using the above-described procedures has a significant disadvantage. For instance, consider a situation when a person wants to show an object to another person remotely, e.g., via the Internet, allowing this person to appreciate freely any detail of the object. This can also apply to what people might want to do when they are purchasing objects online, i.e., e-commerce. Current techniques require the user to take a large number of images of the object or assume the scene structure like the lighting environment is known perfectly. These techniques preclude a very typical situation where a user would take a limited number of snapshots of an object in interest with a digital camera, while moving around the object, and then want that information converted into some sort of representation, so that the user can see the object from arbitrary viewpoints or transfer the representation so that others can view the object.
It is noted that in this background section and in the remainder of the specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, xe2x80x9creference [1]xe2x80x9d or simply xe2x80x9c[1]xe2x80x9d. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.
The present system and process is designed to represent an object in an efficient manner and to allow the synthesizing of photo-realistic virtual images of the object that include the both diffuse and specular reflection effects. This is accomplished using a relatively sparse set of input images and without any direct information concerning the light sources (i.e., such as their radiance and positions). For example, the input images could be captured using a hand-held video camera. The input images needed can be limited to a number which is just enough to collectively depict every surface of the object that it is desired to render in the synthesized images from a viewpoint that captures substantially only diffuse reflection. The only data other than the input image that is required is a 3D model of the object, and the camera parameters. These items are readily available using conventional techniques.
The present system and process first extracts the view-dependent, specular reflection components, and view-independent, diffuse reflection components of the surface reflectance from the input images. Specifically, this is accomplished by computing a global texture map which specifies an intensity value for the diffuse reflection from each modeled portion of the surface of the object using the input images. Then, a specular reflection image is derived from each input image. These specular reflection images specify the intensity of the specular reflection from each modeled portion of the surface of the object depicted in the associated input image.
The global texture map is constructed by respectively identifying sets of pixels in the input images that depict the same portion of the object. Each of these sets of pixels is then processed to first determine which pixel of the set has the minimum pixel intensity value. The minimum intensity value is assigned to the location of the global-texture map corresponding to the portion of the object depicted in the set of pixels. The result is an intensity value being associated with each portion of the object depicted in the pixels of any of the input images. These intensity values represent the diffuse reflection associated with the depicted portion of the object.
Preferably, the 3D model represents the object""s surface as a mesh of triangular patches, and so the global texture map will identify the diffuse reflection components associated with each of these triangular regions. To this end, the process of computing the global texture map preferably includes mapping each input image onto the 3D model so as to identify the location on the model associated with each pixel of each input image. Then, the aforementioned sets of pixels are identified. Specifically, sets of corresponding pixels from the input images that depict the same location in each triangular patch of the 3D model of the object are identified. Finally, as described previously, for each set of pixels, it is determined which pixel of the set has the minimum pixel intensity value, and that pixel value is associated with the location of the global texture map corresponding to the location in the 3D model depicted by the set of pixels.
Once the global texture map is complete, the aforementioned specular reflection images are created by first determining, for each input image, the diffuse reflection intensity value associated with each pixel of the input image from the global texture map. The diffuse reflection intensity value of each pixel is then subtracted from the overall intensity value of that pixel. The result of the subtraction procedure is designated as the specular reflection intensity value for the pixel under consideration. In this way, a specular reflection intensity value is eventually assigned to each pixel location of each input image to form the desired specular reflection images.
The global texture map acts as the desired efficient representation of the diffuse reflection components of the object as a whole. However, the specular reflection images constitute a considerable amount of data, and it would be advantageous to represent the data more efficiently. This is accomplished by using the specular reflection images to model the illumination distribution of the environment surrounding the object, and to estimate reflectance parameter for the object""s surface in the form of an overall surface spectral magnitude and surface roughness. The illumination distribution and reflectance parameters, which constitute much less data than the specular reflection images as a whole, can be employed along with the global texture map to synthesize photo-realistic virtual images of the object.
The illumination distribution of the environment surrounding the object is modeled by establishing a hemisphere of a prescribed radius, which overlies and is centered about the object. This hemisphere is used as a basis to create a separate so-called illumination hemisphere for each of the specular images. An illumination hemisphere is created from a specular image by determining the point of intersection with the hemisphere of a line originating from the location on the object""s surface corresponding to a pixel of the specular image. This line is directed along the perfect mirror direction with respect to the optical ray of the pixel, where the optical ray is a line originating from the optical center of the camera used to capture the input image associated with the specular image and which goes through the input image pixel corresponding to the specular image pixel under consideration. The intensity value of the specular image pixel is associated with the intersection point, and the process is repeated for all the remaining pixels the specular image. In order to reduce the data necessary to model the illumination distribution, the aforementioned hemisphere is preferably represented by a geodesic dome. If so, each intensity value associated with an intersection point is assigned to the vertex point of the geodesic dome that is closest to the intersection point.
Once all the individual illumination hemispheres have been created, they are combined to form a generalized illumination hemisphere. This generalized illumination hemisphere represents the desired model of the illumination distribution of the environment surrounding the object. Combining the individual illumination hemispheres could simply entail computing the mean of the intensity values assigned to any vertex of the geodesic dome having more than one assigned value, and then assigning the mean to that vertex in lieu of the individual intensity values. However, imaging noise and errors introduced in the aforementioned alignment process can reduce the accuracy of the generalized illumination hemisphere considerably. Fortunately, these effects can be reduced before combining the individual illumination hemispheres. The procedure for reducing the effects of imaging noise and errors involves, for each geodesic dome vertex point, first determining in how many of the specular images the vertex point is visible. Then it is determined how many intensity values have been assigned to the vertex point among all the individual illumination hemispheres created from the specular images. These number are compared, and whenever the numbers do not match, the intensity values assigned to the vertex point are eliminated from each of the affected illumination hemispheres. This results in any noise being eliminated as well. The combining procedure is completed by computing the mean of the intensity values assigned to any vertex of the geodesic dome having more than one assigned value, and then assigning the mean to that vertex in lieu of the individual intensity values.
The aforementioned surface reflectance parameters, namely the surface spectral magnitude and surface roughness of the object, are computed next. In essence this entails establishing a reflection model which characterizes the specular reflection from a location on the surface of the object in terms of the surface spectral magnitude, the surface roughness and the magnitude of the color vector associated with each point light source defined by the illumination distribution model. Preferably, a simplified Torrance-Sparrow reflection model is employed for this purpose. Given the model, the particular values of the surface spectral magnitude, the surface roughness, and the magnitude of the color vector associated with each point light source are computed, which collectively minimize the square of the difference between the specular reflection intensity computed using the reflection model and the intensity of the specular reflection taken from the specular images, for each location on the surface of the object corresponding to a pixel of the specular images. In addition, it is preferred that the illumination distribution model be refined as part of the foregoing computations. This is accomplished by replacing the intensity values associated with each point light source defined in the initial generalized illumination hemisphere with the computed magnitude of the color vector associated with that point light source.
The global texture map, generalized illumination hemisphere, and reflectance parameters constitute the desired efficient representation of the input image data, and are all that is needed to synthesize realistic virtual images of the object form any desired viewpoint. Specifically, images of the object are synthesized by first using the global texture map to identify the diffuse reflection values associated with each pixel of an image depicting the portions of the object visible from the desired viewpoint. In this way, a diffuse reflection image is created for the desired viewpoint. Next, the generalized illumination hemisphere and the surface reflectance parameters are used to identify the specular reflection values associated with each pixel of an image depicting the portions of the object visible from the desired viewpoint. This image represents the specular reflection image for the desired viewpoint. Finally, the diffuse and specular reflection images associated with the desired viewpoint are composited to create the desired synthesized image of the object. This compositing simply involves adding the intensity values of the corresponding pixel locations in the diffuse and specular reflection images.
In addition to the just described benefits, other advantages of the present invention will become apparent from the detailed description which follows hereinafter when taken in conjunction with the drawing figures which accompany it.