Illumination conditions cause problems for many computer vision algorithms. In particular, variations in color or intensity of the illumination of a scene captured by a camera can cause problems for algorithms that segment, track, or recognize objects in a captured input image of the scene. For example, consider an input image of a box 102 disposed on a surface of grid lines 104 shown in FIG. 1A. The input image can be a digital image captured by a camera. Because the box 102 is illuminated from one direction, the shading on the box 102 opposite the illuminating direction is nearly indistinguishable from the shadow cast by the box 102 on the surface 104. The dark surface region of the box 102 and the shadow cast by the box 102 onto the surface 104 make the tasks of identifying the box 102 a challenge for image recognition algorithms or segmenting the image of the box 102 from the image of the surface 104 a challenge for many segmentation algorithms.
In order to alleviate the difficulties of image segmenting, recognition, or object tracking, a input image can be ideally decomposed into two intrinsic images called an illumination image and a shadow invariant image. FIGS. 1B and 1C show a decomposition of the input image shown in FIG. 1A into a shadow invariant image and an illumination image, respectively. The shadow invariant image shows the box 102 and the surface 104 with the shadow observed in FIG. 1A removed. In other words, a shadow invariant image is independent of the illumination conditions. On the other hand, the illumination image shows only the shadows cast by the box 102 created by illuminating the box from one direction. Decomposing an image into the intrinsic images can be useful for supporting a range of visual inferences. For example, a typical segmentation algorithm would likely correctly segment the box 102 as a single segment using the shadow invariant image shown in FIG. 1B.
In recent years, a number of different techniques have been developed for decomposing a input image into intrinsic images and in particular to obtaining the shadow invariant image of a input image. Certain techniques derive a shadow invariant image of a scene from a sequence of images of the scene under a range of illumination conditions. However, obtaining multiple images of a scene under various illumination conditions can be time intensive and is not always practical. Other techniques use a single input image but require human interaction or user assisted methods to perform reflection and shadow separation. These methods can produce good results with careful user assistance, but are time and labor intensive. Other techniques that use a single image but without user assistance are learning based approaches that separate reflectance edges and illumination edges in a derivative image. Although these methods can separate reflectance and shading for a given illumination direction, these methods have difficulty classifying edges under arbitrary lighting conditions. In still other techniques, by assuming a Lambertian surface and using a three-band camera sensor (i.e., red, green, and blue bands), a one-dimensional grayscale image that is invariant to shadow and shading can be obtained. However, these techniques transform a red, green, and blue (“RGB”) based image into a one-dimensional grayscale representation which reduces the distinction between surfaces. These techniques have been extended to derive a two-dimensional image using a four-band camera, which eliminates the Lambertian assumption, and theoretically has been proven to generate three-dimensional invariant images that can be recovered with six-band cameras. However, four- and six-band cameras are rarely ever used.
Thus, recovering color shadow invariant images from a single input image of a three-band camera remains a challenging problem for computer vision systems.