Electronic imaging is important in many applications, including computer vision, robotics, industrial automation, surveillance and monitoring, document imaging, medical imaging, and digital photography, among others. One problem with the present state of the art is that the dynamic range of natural optical images can easily exceed the available dynamic range of the imaging devices, the imaging devices either being photochemical emulsions/films/plates or electronic image sensors. For example, optical images in nature may exhibit dynamic range in excess of 1:100,000 even 1:1,000,000. This means that bright parts of an image (e.g., sun-illuminated surfaces) can generate as much as 1,000,000 stronger optical signals than the optical signals generated by dark parts of an image (e.g., deep shadows). When such high dynamic range optical images are received through the image formation system (e.g., optical lens) and projected on the imaging media, the imaging media cannot easily sense such a high dynamic range signal, resulting in signal saturation and clipping in the sensed image. For example, when imaging a human subject standing in a room in front of a bright window, the window in the resultant image is completely white (e.g., “washed out”), while the subject is completely black (e.g., a silhouette). Neither the scene details in the window area, nor the details of the subject can be discerned in the resultant image.
There are numerous other examples where imaging fails due to limited dynamic range of imaging devices. An automobile entering or exiting a tunnel will experience high dynamic range optical images. Conventional image sensors can either image bright portions (e.g., tunnel exit) or dark portions (e.g., interior of the tunnel), but not both simultaneously. A surveillance camera on a sunny day will normally report shadows as completely black and brightly illuminated surfaces as completely white, again limiting usefulness of the entire surveillance system. As these examples illustrate, the limited dynamic range problem of state-of-the-art imaging devices severely limits the usefulness of the entire imaging system.
Another problem caused by dynamic range is mismatch of the signal dynamic range of sensed images (e.g., film or electronic images) and the dynamic range available at a display or print medium. For example, the electronic images sensed by a film, a CCD image sensor, synthetically generated images, or images mathematically reconstructed in computed tomography (CT) or magnetic resonance imaging (MRI) could be 1:1000 or more. On the other hand, common displays could only render dynamic range of 1:256 or less. Print media exhibits dynamic range of 1:50 or less. The problem is how to render a high dynamic range image on a low dynamic range display/print while showing all the details. In photography and computer graphics, the process of converting real-world color and intensities to display or print colors and intensities is known as “tone mapping”. Clearly, a naïve signal scaling would destroy some details due to quantization and/or signal clipping.
The optical images are produced by having a light source illuminate objects and surfaces of a visual scene. This light field is herein called an “illumination field” or simply “illumination”. The “visual scene”, or simply “scene”, is commonly understood as a collection of surfaces and objects that an observer (e.g., camera or a human) is looking at. The scene appearance is substantially defined by the reflective properties of its surfaces and objects. The scene's reflective property distribution, called herein “reflectance map” or simply “reflectance”, reflects some of the illumination field to radiate an optical image. This optical image, called herein “radiance map” or “optical image”, is then sensed by a camera, an eye or other suitable imaging system.
When interacting with a three-dimensional visual scene, an illumination field could produce significant shadow patterns and have a dynamic range of 100,000:1 or more. Furthermore, the illumination-induced variations (e.g., shadow patterns, and interreflections) may completely alter the appearance of an object in a scene. Computer vision algorithms that often aim to recognize objects in a visual scene have difficulty accounting for all possible variations resulting in unreliable performance in real-world, unconstrained environments. Humans, on the other hand, seem to be able to compensate for a wide variation in illumination field. Humans are rarely fooled even when complicated illumination fields produce deep shadows and complicated patterns in the scene.
It is well known that object reflectance have dynamic ranges of 100:1 or less. For example, the reflectance of black velvet is about 0.04 (e.g., reflects at the most 4% of the illumination), while the reflectance of white fresh snow is about 0.96 (e.g., reflects 96% of the illumination at the most). Therefore, one way to approximately obtain the true appearance of an object, that is its underlying reflectance, is to illuminate the object with a uniform light source from all possible directions to avoid shadows. Then the radiance map will largely have dynamic range of the underlying reflectance map. While this may be possible in some restricted industrial settings at great expense, it is not practical in many real-world situations where natural ambient illumination and general shape and arrangement of objects cannot be controlled. Therefore, we are faced with the problem of estimating underlying object reflectance from radiance maps.
The experts in the art have strived to compensate for generally widely varying illumination fields in order to recover low dynamic range reflectance maps and thus avoid the high dynamic range problem. Commonly, a radiance map I(x,y) is regarded as a product:I(x,y)=R(x,y)·L(x,y)  (1)where R(x,y) is the reflectance map of the object and L(x,y) is the illumination field at each point (x,y). Computing the reflectance and the illuminance fields from real images is, in general, an ill-posed problem. Therefore, various assumptions and simplifications about illumination field L, or reflectance R, or both have been proposed in prior art in order to attempt to solve the problem. A common assumption is that L varies slowly while R can change abruptly. For example, homomorphic filtering (T. G. Stockam, Jr., “Image Processing in the Context of a Visual Model,” Proceedings of the IEEE, Vol. 60, 1972, pp. 828-842) uses this assumption to extract R by high-pass filtering the logarithm of the image, log(I). Horn assumes that L is smooth and that R is piece-wise constant (B. K. P. Horn, “Determining Lightness from an Image,” Computer Graphics and Image Processing 3, 1, 1974, pp. 277-299). Then taking Laplacian of the image's logarithm (log(I)) removes slowly varying L while marking discontinuities caused by the changes in R.
Of course, in most natural images the assumptions used in these examples are violated. For example, shadow boundaries on a sunny day will create abrupt changes in L. Under such conditions the homomorphic filtering would create a “halo” (i.e., negative gradient) artifact in the recovered reflectance at the shadow boundary. Horn's method would interpret the abrupt change in L as a change in R and wrongly estimate R.
Closely related to the homomorphic filtering is Land's “retinex” theory (E. H. Land and J. J. McCann, “Lightness and Retinex Theory,” Journal of the Optical Society of America 61, No. 1, January 1971, pp. 1-11). Retinex estimates the reflectance R as the difference of the logarithm of the image I and the logarithm of low-pass filtered version of the image that serves as the estimate for L(x,y). The “halo” artifacts are produced at large discontinuities in I(x,y) because the low-pass filtered image smoothes over large discontinuities. Rahman, et al. (U.S. Pat. No. 5,991,456, also D. J. Jobson, et al., “A multiscale Retinex for bridging the gap between color images and the human observation of scenes,” IEEE Trail. Images Processing, Vol. 6, No. 7, 1997, pp. 965-976) extended the algorithm by essentially combining several low-pass copies of I(x,y) using different cut-off frequencies for each low-pass filter. Since this combination retains some moderately high spatial frequencies, the estimate of L(x,y) can better describe abrupt changes in L. This helps reduce halos, but does not eliminate them.
The halo artifact, also called “negative gradient” or “edge banding”, is a visible bright and dark band along discontinuities in the input image. This artifact is more visible near high contrast discontinuities. The halo artifact is a very objectionable artifact for human viewing. In medical and automated computer vision, halo artifacts could be misleading and potentially dangerous.
In order to eliminate the notorious halo artifacts, Tumblin et al. introduced the low curvature image simplifier (LCIS) hierarchical decomposition of an image (Tumblin, J., and Turk, G. “LCIS: A boundary hierarchy for detail-preserving contrast reduction,” Proceedings ACM SIGGRAPH 99, pp. 83-90). Each component in this hierarchy is computed by solving a partial differential equation inspired by anisotropic diffusion (Perona, P., and Malik, J., “Scale-space and edge detection using anisotropic diffusion,” IEEE Tran. Pattern Analysis and Machine Intelligence, Vol. 12, No. 7, 1990, pp. 629-639). At each hierarchical level, the method segments the image into smooth (low-curvature) regions while stopping at sharp discontinuities. The hierarchy describes progressively smoother components of I(x,y). L is then mostly described by smooth components (accounting for abrupt discontinuities) while R is described with components containing a greater spatial detail. Tumblin and Turk attenuate the smooth components and reassemble the image to create a low-contrast version of the original while compensating for the wide changes in the illumination field. This method drastically reduces the dynamic range but tends to overemphasize fine details, thus still creating objectionable artifacts in the result. The algorithm is computationally intensive and requires the selection of no less than eight different parameters that need to be appropriately selected by a human user for a particular image.
Since Tumblin's method attempts to account for discontinuities, it is an improvement over other dynamic compression algorithms based on linear harmonic decomposition of the signal. For example, U.S. Pat. Nos. 5,012,333 and 4,812,903 teach methods of decomposing an input image into a low-frequency component and a high-frequency component. By modifying the low-frequency component with a nonlinear tone scale curve and amplifying the high-frequency component (unsharp masking), one can combine the two modified components back to form a new image which has a smaller dynamic range and yet retain most image details. U.S. Pat. Nos. 5,608,813 and 5,454,044 teach similar methods, but formulate them in spatial domain. These methods all suffer from producing halo artifacts around high contrast edges.
Another approach to deal with halo artifacts around sharp discontinuities in the original image has been disclosed in U.S. Pat. Nos. 5,471,987 and 5,796,870, where the low-frequency component is calculated from a weighted average of pixels within a predetermined kernel. The weighting within the kernel is chosen so that less weight is applied to a particular pixel in the kernel if its absolute intensity difference from the center pixel is large. The weighting is also chosen so that less weight is applied to a particular pixel if it is located farther away from the center pixel. By such a differential weighting scheme, pixels across a high contrast edge are not averaged together; therefore, the low-frequency component retains relatively sharp high contrast edges. This type of selective weighting in both space dimension and signal intensity dimension has been dubbed “bilateral filtering” by Tomasi, et. al. (C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” Proceedings of IEEE Int. Conf. on Computer Vision, 1998, pp. 839-846). In essence, these approaches may yield discontinuity-preserving smoothing method for producing low-frequency component of the original signal that substantially preserves sharp discontinuities. When such a low-frequency component is subtracted from the original image signal, the resulting high-frequency components will not exhibit halos, the overshoots or undershoots, near a high contrast edge. The main drawback of bilateral and similar filtering is that it is very slow to compute. Another drawback is that in some situations the result may require local corrections, which requires manually setting numerous parameters.
Durand et al. (F. Durand and J. Dorsey, “Fast Bilateral Filtering for the Display of High-Dynamic-Range Images,” ACM SIGGRAPH 2002, pp. 257-265) teach a method of speeding up bilateral filtering in image dynamic range compression. Using bilateral filtering, they first compute a discontinuity-preserving smoothed version of the logarithm of the original image that they call the “base layer”. Durand et al. then subtract the base layer from the logarithm of the original to obtain the high-frequency component which they call the “detail layer”. Finally, they combine an attenuated base layer with the detail layer and perform exponentiation to obtain the final image that has compressed dynamic range after additional correction is performed where necessary. These steps are summarized in the following algorithm:
beginlog_input = log(input_image)base_layer = BilateralFiltering(log_input)detail_layer = log_input - base_layertemp_image = compression_factor * base_layer + detail_layeroutput_image = exp(temp_image)end.
These steps generally illustrate what has been commonly done in previous art to compress the dynamic range and improve contrast of images: 1) the input image is decomposed into a low-frequency component; 2) the low-frequency component is subtracted from the original to obtain a high-frequency component; 3) either of the two components, or both, are compressed through linear or nonlinear mapping function, and 4) the results are recombined to yield the result.
A human observer seems to be able to easily see individual objects in both very bright and very dark areas of a high dynamic range scene. This is probably because the eye adapts to local intensity levels as we scan the different regions of the scene. Mead, et al. in U.S. Pat. No. 4,786,818 describes an image sensor that emulates the vertebrate retina in producing space-time derivative signals in response to image pixels. By producing the space derivatives, the D.C. component of the high dynamic range scene is largely removed, resulting in an image with lesser signal dynamics. Mead's retina estimates the local intensities by smoothing the original image using a uniform resistive grid (similar to the resistive grid shown in FIG. 2). Then this smoothed version is subtracted from the original image. This sequence of operations produces “halos” at image discontinuities approximating second derivative of the image. In fact, those skilled in the art will recognize that this sequence of operations is substantially similar to what is in computer vision commonly known as “Witch Hat” operator or “Difference-of-Gaussian” operator. A uniform resistive grid of FIG. 2 with substantially linear resistors solves a discretized version of the equation:
            ∇      2        ⁢          u      ⁡              (        x        )              =                    u        ⁡                  (          x          )                    -              v        ⁡                  (          x          )                            α      2      where u(x) is the output voltage distribution, and v(x) is the input voltage distribution. The spatial smoothing constant α is determined by the ratio of the horizontal and vertical resistors as α=√{square root over (Rv/Rh)}. An infinite one-dimensional network performs convolution with an exponential kernel:
      1          2      ⁢      α        ⁢            ⅇ                        -                                  x                                      /        α              .  
U.S. Pat. No. 5,086,219 by Koch, et al. shows a one-dimensional optical sensor for detecting discontinuities in the sensed image. The sensor uses two uniform resistive grids with different smoothing constants to produce two smooth versions of the original image. The two smooth results are subtracted to produce a high-pass filtered result commonly obtained by convolving the original image with the “Witch Hat” operator (e.g., difference of two exponential kernels). Finding signal zero crossings in this high-pass image substantially detects discontinuities in the original image.
A uniform resistive grid of FIG. 1 with substantially linear resistors solves the discretized version of the diffusion (heat) equation:
      c    ⁢                  ⁢                  ∇        2            ⁢              u        ⁡                  (                      x            ,            t                    )                      =            ∂              ∂        t              ⁢          u      ⁡              (                  x          ,          t                )            where x is a spatial coordinate, t is time, u(x,t) is the nodal voltage distribution at time t, and c is some positive constant. If the intensity distribution of an image v(x) is mapped as an initial condition to an infinite grid of FIG. 1 (e.g., u(x,0)=v(x)) and then allowed to diffuse throughout the horizontal resistors, the distribution of the nodal voltages at any point in time t>0 can be obtained by solving the above diffusion equation. This is exactly the relation that holds if the image is convolved with a Gaussian of variance s2=2ct. Therefore, as the time passes the grid of FIG. 1 convolves the original image with increasingly broader Gaussian kernel.
FIGS. 1 and 2 show one-dimensional grid examples. As commonly known to those skilled in the art, these one-dimensional examples can be readily extended to two dimensions to solve two-dimensional problems pertaining to two-dimensional signals such as digital images.
When using linear resistors, the resistive grids shown in FIGS. 1 and 2 behave as linear smoothers. They cannot determine where the discontinuities occur in the input and will tend to blur across them. As discussed earlier, blurring across large discontinuities is not desirable, as it will produce objectionable halos. To achieve edge-preserving smoothing, the general solution has been to use nonlinear horizontal resistors whose current-voltage characteristics are nonlinear. As indicated in FIGS. 1 and 2, the prior art changes the equivalent horizontal resistance as a function of the voltage drop across the terminals of each corresponding horizontal resistor.
Perona and Malik iterate to numerically solve the diffusion equation for FIG. 1. At each iteration, they set the value for each horizontal resistor as the function of the voltage difference across the resistor's terminals so as to impede the diffusion process among nodes that exhibit large discontinuity, and to promote diffusion among nodes that are substantially of the equal intensity. As discussed earlier, Tumblin uses this form of anisotropic diffusion to compress the dynamic range of images.
Mead (U.S. Pat. No. 4,786,818) partially solves the discontinuity-smoothing problem by using saturating horizontal resistors in the grid of FIG. 2. The current through a saturating horizontal resistor will saturate as the voltage across the resistor becomes large, thus not smooth as much for large discontinuities. As discussed above, Mead subtracts this smoothed version from the original to compensate for wide illumination conditions.
Other proposed solutions based on stochastic arguments and minimization of cost functionals suggest a special type of horizontal resistor called resistive fuse. The resistive fuse acts as a common (linear) resistor for small voltages, but “blows” and conducts little or no current if the voltage across its terminals exceeds a predetermined threshold voltage. Since image intensity v(x) is quite different across a discontinuity, it is presumed that the large discontinuity will be caused in the smoothed output u(x) where the resistive fuse can then act as an open circuit and prevent blurring across that discontinuity. Harris (U.S. Pat. No. 5,062,000) and Decker (U.S. Pat. No. 5,223,754) show various circuit implementation of resistive fuses.
There are at least three problems associated with previously proposed discontinuity-preserving smoothing resistive networks. The first problem is that the presence or absence of a discontinuity in the input image v(x) is judged based on the smoothed result u(x). Therefore, the horizontal resistors that adjust their resistances as a function of their terminal voltages (e.g., Rh=f(u)) may not be able to appropriately capture discontinuities present in the input image. The second problem is that the circuit designer is left with a difficult task of creating nonlinear resistors whose resistivity changes with the voltage drop across its terminals. Generally, circuit designers have settled for nonlinear resistor functions Rh=f(u) that they can implement with reasonably complex circuits. These functions approximate desired behavior but may not be the most optimal choice from the mathematical point of view. Because of this difficulty, it seems that the designer is left with very few degrees of freedom to create arbitrary f(u). The third problem is that when the network is to be solved numerically, the nonlinear resistive element requires iteration. For example, previous designs first solve the linear resistive network to obtain a version that smoothes everywhere including across discontinuities. Subsequently, large discontinuities in the smoothed result are indicated to determine which resistors to take out (e.g., which fuses to blow) and the process is iterated again. This iterative process is numerically involved and still suffers from the problem that the presence or absence of discontinuities is judged based on the smoothed image.
Despite these prior art systems, there remains a very real and substantial need for a numerically efficient image processing method for discontinuity-preserving image smoothing and segmentation, noise reduction, reduction of the image dynamic range reduction for printing and viewing, exposure problems compensation, and reduction of image appearance variation due to widely changing illumination conditions. Further, there is a very real and substantial need for an image sensing apparatus for sensing wide dynamic range optical images that is able to produce reduced dynamic range sensor images that preserve virtually all important spatial details contained in the optical image. The present invention has been developed in view of the foregoing.