Using images of objects to derive information such as the object's shape has many uses. Examples include surveillance, geographic information systems, and machine inspection of manufactured goods. Other examples include single projector or multiple projector display systems designed to display images on many different types of surfaces, including two-dimensional projection screens and three-dimensional objects.
Many techniques are used for mapping a surface shape of an object. Most techniques utilize one or more projectors to project display images onto the object. A camera is used to capture images of the object as the display images are projected onto the object's surface. The images taken by the camera are used to register each of the projectors. These display systems compute transformations which are applied to each image to be projected. The observed position of projected features observed in the camera image are used to create an explicit list of correspondences between the image to be projected and the actual display surface positions. These correspondences are interpolated to create a mapping between each projector pixel and the display surface.
A so-called “shape-from-shading” technique makes assumptions about the illumination and light reflecting properties of the surface in question, and uses image intensities to estimate a surface normal at each point on the surface. Stereo and multi-view imaging techniques use two or more images of the same object in order to determine the surface shape. Structured-light techniques (also called active sensing) use a projected light pattern of known shape, together with a camera, in order to capture information about the surface shape. The use of a known projected light pattern simplifies the generation of correspondences between locations in the projected pattern and locations in the camera image. Other techniques calibrate display systems by using a camera to capture images of the display surface illuminated with projections of a sequence of patterns (also called coded structured light).
For all such display systems, non-Lambertian (e.g., shiny, transparent, translucent) surfaces are problematic. A Lambertian surface reflects the light equally in all directions. If a surface is a Lambertian surface, the appearance of the surface doesn't vary with viewpoint. An enlarged, cross section of a Lambertian surfaces indicates a rough or jagged surface. So, there are no preferred angles of reflection. Lambertian surfaces are also called diffuse surfaces. FIG. 1 illustrates a 100% specular surface, which is non-Lambertian. An impinging light 14 impinges a surface 12 of an object 10. The surface 12 is 100% specular. As such, the impinging light 14 is reflected such that an intensity of the reflected light 16 is equal to an intensity of the of the impinging light 14. Also, as there is no scattering of the impinging light 14 on a 100% specular surface, the reflected light 16 is reflected at the mirror angle. In other words, the angle of the impinging light 14, angle θI, measured from the surface normal is equal to the angle of the reflected light 16, angle θR, measured from the surface normal. FIG. 2 illustrates a 100% diffuse surface, which is Lambertian. The impinging light 14 impinges a surface 22 of an object 20. The surface 22 is 100% diffuse. As such, the impinging light 14 is reflected such that an intensity of the reflected light 26 is equally distributed in all directions. The sum of the intensities of all reflected light 26 is equal to the intensity of the impinging light 14.
When all or a portion of the display surface is not a diffuse reflector (e.g., the surface is non-Lambertian), certain portions of the projected pattern can appear too bright or too dark in the captured image, resulting in failures to identify the projected pattern. For example, a “hotspot” or highlight, can over-saturate the camera sensor in a region that is highly specular (that is, “shiny” in the sense of a mirror or highly polished surface) where the camera and projector lie along the “mirror angle”, such as illustrated in FIG. 3. In the exemplary configuration of FIG. 3, a surface 32 of an object 30 is highly specular. A camera 60 is positioned at the mirror angle relative to a projector 50. In such a configuration, the intensity of the reflected light 36 captured by the camera 60 can over-saturate the camera sensor. It can also happen that a specular surface reflects too little light to be detected, such as when the camera is placed too far off from the “mirror angle”, such as illustrated in FIG. 4. For a highly specular surface, such as the surface 32, the further the camera 60 is from the mirror angle, the lower the intensity of the reflected light 46. Low intensity reflections can also occur when a display surface with a low albedo is used (e.g., one painted a dark color).
For translucent surfaces, projected light can penetrate the surface, be scattered by the material below the surface, and then leave the object at a different point on the surface. This phenomenon is called subsurface scattering. FIG. 5 illustrates an example of subsurface scattering. The projected light 14 impinges a surface 52 of a translucent material 50. The projected light 14 scatters within the material 50 as refracted and scattered light 54. Some of the refracted and scattered light 54 can exit the surface 52 as reflected light 56. The reflected light 56 can be reflected in many different directions, typically at low intensities.
Translucent and transparent surfaces can transmit impinging light and can also reflect the same impinging light, whether the impinging light encounters the surfaces internally or externally to the object. FIG. 6 illustrates an example of light impinging a transparent material. The impinging light 14 impinges a surface 62 of a transparent material 60. A portion of the impinging light 14 is reflected as reflected light 63 and another portion of the impinging light 14 is transmitted as refracted light 62 through the transparent material 60. When the refracted light 62 impinges a second surface 61 of the transparent material 60, a portion of the refracted light 62 is reflected off the second surface as reflected light 64 and another portion of the refracted light 62 passes through the second surface, thereby exiting the transparent material 60, as refracted light 66. The reflected light 64 is transmitted through the transparent material 60. When the reflected light 64 impinges the first surface 62, a portion of the reflected light 64 is reflected back into the transparent material 60 as reflected light 67, and another portion of the reflected light 64 passes through the first surface 62, as refracted light 65, and so on. In this case, the impinging light 14 can leave the object at a different point than at which it first encounters the surface, such as refracted light 65 and refracted light 66. These phenomena can result in incorrect position data.
Many conventional cameras use CMOS or CCD image sensors to capture images. However, CMOS and CCD image sensors have a limited dynamic range. The effective dynamic range is bounded by a minimum threshold and a maximum threshold. The minimum threshold defines a minimum number of photons that the image sensor must capture to detect an image, and to avoid under-exposure. The maximum threshold defines a maximum number of photons that the image sensor can capture before the image becomes over-exposed. After a certain number of photons have charged an individual collection site (pixel), the site becomes saturated, and cannot record any additional photons. The minimum number of photons that can be measured is also limited, due to the effects of heat in the sensor site itself, which makes it difficult, it not impossible, to get a true “zero” reading from the sensor. For example, digital images under low-light conditions look “noisy” or “grainy.” The rate at which photons are admitted to the sensor can be controlled by changing either (or both) of the aperture (size of the lens opening) or the “exposure time.” The exposure time is the amount of time the sensor is allowed to collect photons.
The exposure time can be controlled with a physical shutter or an electronic shutter. The physical shutter opens and closes to briefly allow light through the lens. The electronic shutter works by discharging the sensor site (pixel) and then allowing the sensor site to collect photons for some period of time. Subsequently, this charge is electronically transferred to a storage site, where it is later read, as in global shuttering. Alternatively, the value of the charge is simply read at some later point in time, as in rolling shutter. The resulting charge is then measured and converted to a discrete digital value. Typically, the values are roughly linearly spaced, and can be further modified by a digitally-implemented transfer function. The resulting value for the pixel irradiance is typically stored as an 8-bit, unsigned integer. In the case of color cameras, three such values are stored, one per primary color, where a separate sensor, covered with an appropriately-colored filter, captures the intensity of each primary color. As a final step, the image is typically stored in a compressed image format, such as JPEG, to the camera's storage medium.
The effective dynamic range of a CMOS or CCD camera can be, in effect, extended using “high dynamic range” (HDR) imaging techniques. In HDR imaging, multiple images, taken with different exposure times, are made while keeping the position of the camera, the position of the object, and the position(s) of any light source(s) fixed. HDR imaging provides a larger dynamic range of brightness, which is then combined and compressed. However, cameras that provide HDR imaging are very expensive. Techniques for capturing HDR imagery with available cameras exist, but require multiple images to be captured, on the order of 16 exposures per single frame. The subsequent time to process these additional images is prohibitive in most applications.
For a specific imaging task, capturing many images of the object may be required. For example, in a coded structured light task, several images may be required, each capturing the object illuminated by a different light pattern. Multiview techniques may require multiple images from different camera positions. The total number of images required to complete the specific image capture task may then exceed the storage capacity of the camera. Also, the total number of images required to be captured may require more time than is available for the task. For example, each individual image may require several seconds to capture, depending on the camera used. Increasing this capture time can be impractical.
The collection of multiple exposures making up the HDR image must be further manipulated by post-processing operations to be useful. The results of the post-processing are then stored in non-traditional image formats, for example, storing pixel values as 24 or 32-bit floating point values. These non-traditional image formats are not supported by typical software tools. There is no universally accepted standard format, as there is for traditional 8-bit-per-pixel images, such as JPEG. If, instead of this post-process, the entire collection of separate images are used, then 16 times the storage space, transfer time, and memory space are required to process the images. These issues relating to HDR imaging result in an orders-of-magnitude increase in the storage cost and processing time for an imaging task, as compared to the storage and processing of single-exposure, 8-bit or 10-bit images in traditional formats.
Despite the increased dynamic range afforded by HDR techniques, and hence their ability to handle very bright or dim reflections, as in FIGS. 3 and 4, these techniques do not overcome the problems with translucent and transparent materials, as in FIGS. 5 and 6. Further, there are other fundamental limitations in camera sensors that HDR techniques cannot overcome. In cases where the illumination exceeds the minimum or maximum limits of the irradiance range of the camera, the desired information is not present in any image, at any exposure. In these situations, the information (e.g., the capture image of the object as illuminated with the structured light pattern) is simply not captured and cannot be recovered. The resulting inability to complete the structured-light imaging task results in the failure of the application.
An alternative to HDR imaging for capturing non-Lambertian surface properties is to image the object from a multitude of illumination and/or sensor angles. Techniques using this approach are referred to as “multi-view techniques.” With enough captured images, some images may not have the sort of problems indicated in FIGS. 3 and 4, for highly specular surfaces. In many multi-view techniques, multitudes of images are captured from a large number of vantage points to enable capture of object geometry. Other multi-view techniques use a multitude of illumination angles. Such techniques either require the object to be mounted on a turntable, or a large number of images from different vantage points to be taken by mounting an array of cameras/lights, which is expensive, and/or repositioning the camera/lights and taking multiple images, which is time consuming. However, it is often impossible, in real-world situations, to obtain images from these multiple viewpoints, or multiple illumination angles.
Recent approaches for capturing both shape and surface properties of non-Lambertian materials have several drawbacks, including but not limited to, requiring careful initialization, assuming the surface has uniform reflectance properties, and limitations to specifically shaped objects. Many such approaches result in low-fidelity estimates of surface geometry.
Approaches that attempt to deal with phenomena such as subsurface scattering combine multi-view and HDR techniques, thereby acquiring the drawbacks of both approaches. Both HDR techniques and multi-view techniques for imaging of non-Lambertian surfaces are cumbersome, slow, require large amounts of storage and processing, are often impractical, and can fail completely in some circumstances.