1. Field of the Invention
This invention relates to image processing and in particular to techniques for processing images captured under varying colors, and/or levels of illumination, such as video imaging systems.
2. Description of the Prior Art
Conventional image processing systems capture and reproduce images as illuminated by ambient light. That is, an image will be reproduced differently if captured under different ambient illumination conditions. Such conventional systems may have difficulty reproducing the colors, and/or levels of intensity and contrast, of an image as seen by a human observer as a result of the color of the illumination and/or varying levels of illumination in the image.
For example, the skin tones of a color image of a person photographed in midday sunlight will differ from the skin tones of a photograph of the same person taken at sunset. The ambient light at sunset includes colors which are reflected by the skin and reproduced in the skin tones by the imaging process. To a human viewer, however, the skin tones appear about the same, whether viewed at sunset or in midday sunlight as a result of automatic corrections made by the human nervous system.
The eye and brain, however, are thought to be able to perceive the approximate reflectance of objects under a variety of illuminants, colored and achromatic. These abilities are known as color constancy and brightness constancy. These constancies result from range compression, with the range maximum set by the brightest or most colorful object in a portion of the scene. As the illumination varies, the light reaching the eye from the most reflective object varies, and the range is adjusted accordingly.
It is thought that while the sensors of the eye can faithfully capture image intensities over a range of more than three orders of magnitude, the range of the optic nerve, which carries the image from the retina in the eye to the brain, is about 1.5 orders of magnitude. This view is presented, for example, by Creutzfeldt et al. in the article entitled "Darkness induction, retinex, and cooperative mechanisms in vision," Exp. Brain Res., volume 67, p. 270, 1987.
The retina must cope with a dynamic range mismatch similar to the mismatch, discussed below, of conventional artificial sensors and display media, and so must provide compensating mechanisms. Photographers and videographers who are not aware of this fact often think that there is something wrong with the camera when, in viewing the captured image, there is a strong coloration or deep shadows due to illumination not observed when the scene was being captured.
In fact, however, the brain corrected for the illumination at the time of image capture using range compression techniques, but the camera recorded the true scene and the display media could not faithfully reproduce it. Thus it has been desirable to interpose a range compression process between image capture and presentation in order to display recorded images faithfully to the human viewer, that is, to display recorded images so that they appear in the same way they appear to the viewer while being captured. This process may be called color or brightness constancy, range-matching or enhancement.
Many algorithms for determining color constancy corrections have been developed which attempt to provide information for correcting the colors at each point in a color image in response to color average information from surrounding points. In general, such algorithms dictate the reduction of a particular color at a specific point if surrounding points indicate an overabundance of that color. That is, a spatially weighted average of color is removed from each specific point.
For example, in a color image photographed at sunset, an overabundance of red would be detected and the algorithm used would indicate that the red color at each point be reduced by the addition of a determinable intensity of red's complementary color, green, at that point. The resultant corrected image would appear more like an image photographed in midday sunlight.
Conventional techniques for implementing such image color constancy correction algorithms require substantial time for the necessary mathematical calculations and therefore must be performed during post processing if at all. Although post processing may be possible for still images, conventional techniques are very tedious and time consuming and therefore little used. When used, such conventional color constancy correction systems typically require point by point color corrections. Video imaging systems require almost constant corrections and such conventional algorithm implementation techniques are not convenient for common use, even during post processing.
Known color constancy correction techniques also tend to distort images by creating noticeable halos of incorrect coloration at color transition edges. That is, because such known techniques rely on spatial color averaging techniques, sharp color changes are improperly corrected in the vicinity of the edges. For example, a red ball on a gray background Will appear to have a noticeable green halo near the edge of the ball because the average color in the vicinity of the color transition will not accurately reflect, on average, the overabundance of illumination coloration.
The human central nervous system, on the other hand, compensates for colors present in the ambient illumination automatically so that skin tones, for example, appear the same to the human viewer even under vastly different lighting conditions. Further, the color constancy corrections made by the human central nervous system do not noticeably create halo distortion.
The growing use of video imaging processes increases the need for real time or near real time implementations of brightness and color constancy correction techniques and, in particular, implementations which do not noticeably distort the images.
Conventional systems have difficulty in properly reproducing images illuminated with widely varying levels of illuminations. Image sensing processes, or detectors, commonly have a greater range of sensitivity than processes that are used to present images. Paper based processes such as photographic, ink, and electrostatic printing and electronic processes that rely on cathode ray tubes--or CRTs, liquid crystal displays--or LCDs, electroluminescent displays--or ELDs, plasma displays, and etc., typically cannot produce the dynamic range in presentation that is captured by photographic systems or electronic imaging devices such as charge-coupled-devices--or CCDs, infra-red or IR devices, charge injection devices--or CIDs, and the like.
In such conventional systems, the minimum level that can be sensed is usually set by the noise level inherent in the device. The maximum level that can be sensed in a CCD camera, for example, is typically more than 3 orders of magnitude greater than the minimum or noise level of the camera. That is, there is about a factor of 1000 range of image intensities that can be sensed in a CCD camera. However, conventional video monitors used to view images detected by CCD cameras can often only display intensities covering about 1.5 orders of magnitude, that is, there is only about a factor of 30 range of image intensities that can be displayed on such monitors.
There are two particularly troublesome problems that occur with dynamic range mismatch between the image sensor and the display device, that is, when displaying an image detected with a wide-range sensor on a limited range monitor or other medium. First, since the sensor signal must be scaled down for display, the incremental contrast of the final result is often poor. Detail in the original scene is displayed at low contrast. Second, important detail may be displayed near the top or bottom of the monitor range, lost in brightness or darkness. This problem is especially obvious in scenes in which one area is much brighter than the rest of the scene.
It is well known that the reflectances of natural objects tends to vary over approximately a factor of twenty. That is, the ratio of intensities across an image due to the reflectances of the objects in the image varies from the lowest intensity to the highest intensity so that the highest intensity object is about twenty times more luminous than the lowest intensity object in the image. This physical phenomena is called herein the Rule of Twenty.
Conventional display media, such as video monitors, are often capable of displaying image intensity ranges greater than required by the Rule of Twenty, typically such monitors can display output ranges that span about a factor of thirty.
However, lighting differences across a scene often introduce a greater range of image intensities across the scene than caused by the reflectance variation. That is, image intensity is equal to the product of illumination intensity and reflectance. The range of common illumination intensities across a scene are not limited by the Rule of Twenty.
In an outdoor scene, for example, the amount of light reflected from an object that is partly lit by direct sunlight and partly in shadow may vary by a factor of 100 or more even though the portion of this range resulting from reflectance changes does not exceed the Rule of Twenty. Conventional image sensors can capture this range, but most conventional display media cannot faithfully reproduce it.
Flames, for example, will usually overload a monitor range unless the input is scaled by, for example, scaling a sensor input signal electronically, reducing the iris aperture or shortening the pixel exposure time. With the reduced sensitivity of the sensor input, though, it is hard to see more than a silhouette of what is burning. Important details such as numbers on a burning vehicle, for example, may be lost in darkness.
In this example, the flames and vehicle numbers are separated in intensity by substantially more than one or 1.5 orders of magnitude. The flames and vehicle number may therefore not be simultaneously displayed on a monitor having a display range of only about 1.5 orders of magnitude between the brightest and darkest portions of the displayed image.
A well known photographic darkroom technique called dodging has long been used to cope with the dynamic range mismatch between a film negative and the photographic print paper. If a photographic negative of an image including a full-range of intensities is printed without special processing, some image portions may be printed as too bright or too dark, depending on the exposure time used in making the print. If the print is too dark in places and the exposure is shortened, other regions will be too light. Similarly, if the exposure time is lengthened to darken areas that are too light, the final printed image will include areas that are too dark.
An experienced photographer, however, may compensate by dodging. In this process, the print is made so that bright areas are printed properly. In areas which would otherwise be printed too dark, the exposure is limited by moving an opaque object such as a properly shaped card or other obstruction between the print paper and the exposing light. The movement of the opaque card blocking the exposure light reduces the total light, and therefore the exposure, in the darker areas of the image. Dodging may be thought of as an equivalent of subtracting a low-pass filtered version of the image from the dark regions. Detail is preserved and the intensity of the image area is kept within the range of the print paper. If the photographer is not careful and inadvertently interferes with exposure light applied to a light region around the dark region, a halo may be created around the dark region in the final print.
In a related photographic development technique known as masking, a positive transparency is made of a slightly blurred version of the image and inserted in the path of the exposure beam during development. The blurred positive forms a mask to diminish exposure of the print paper in dark regions. Subtraction of this mask, which is a scaled and low-pass filtered version of the image, reduces the range of the image to match that of the print paper. Masking also increases the incremental contrast in the final print because it is a form of high-pass filtering.
Unlike the dodging process, the mask which is subtracted does not have sharply defined borders between regions. Halos are therefore created by masking that are often objectionable near large changes in image brightness.
In the electronic image processing analog to masking, a blurred, scaled version of the image is subtracted from the original to scale the range of the final result to fit that of the display medium. The blurred, scaled version of the image may be obtained optically or electronically. Near sharp borders or sharp changes of intensity, this method produces halos that can be objectionable and may exceed the dynamic range of the presentation medium. To reduce the halo effect, the amount of the low-pass version that is subtracted varies adaptively with the magnitude of the spatial derivative of the image. That is, the degree of masking is adjusted in accordance with the magnitude of the difference between the sharp and blurred versions of the image. This technique of varying the degree of masking is called adaptive filtering herein.
In general, dodging has the advantage over adaptive filtering for image processing for presentation, because dodging compresses the range of the image to fit that of the presentation medium and preserves the original detail contrast uniformly within regions. Adaptive filtering does not preserve the original detail as well as dodging. In regions of high contrast texture, for example, adaptive filtering changes the image very little. Also, the techniques used for overshoot correction in adaptive filtering do not enhance the detail near high contrast edges.
Techniques for adaptive filtering are well known, and are, described for example by William F. Schreiber in the article entitled "Image Processing for Quality Improvement", Proceedings of the IEEE, Vol. 66, pp. 1640-1651, 1974. In such techniques, low-pass filtered image gradients serve as an adaptive modifier of the low-pass filtered images before subtraction from the original image, in order to diminish the haloing that would otherwise result from the high-pass filtering step.
Conventional image processing techniques for color and brightness constancy, that is color and illumination intensity corrections, such as dodging and adaptive filtering may not be suitable for real-time video image processing. Dodging is performed manually for photographic processing. Adaptive and high pass filtering have been implemented with digital signal processors but have been limited in size of the image area that can be processed with convenient amounts of computing power and memory storage.
A substantial improvement in image processing, beyond that available with conventional digital techniques, which is suitable for real time video image processing is shown in U.S. Pat. 4,786,818, Mead et. al. which discloses an integrated image processor, known as a spatially-weighted analog or resistive grid, for processing black and white electronic images. The Mead resistive grid is a low-power analog chip that includes image sensors and processors on the same chip and does not require substantial storage memory. The grid performs high-pass filtering of the image and so suffers from haloing effects.
What is needed therefore is a brightness and color constancy correction technique which permits images to be recorded and/or displayed in a manner similar to that as perceived by the human viewer at the time of image capture which can conveniently be accomplished in real or near real time without haloing or other substantial distracting distortions.