Background subtraction is a popular technology for finding moving objects in images of an environment. Unfortunately, there are numerous factors that can adversely impact the efficacy of this class of techniques. Such disturbances include changes in camera responses due to automatic gain and color-balance corrections, image jitter due to vibration or wind, perceptually-masked artifacts due to video compression or cabling inadequacies, and varying object size due to lens distortion or imaging angle.
Some of these problems have simple solutions, but they are not optimal. While video can be transmitted and recorded in an uncompressed state, the required bandwidth and disk-storage space increases costs significantly. Similarly, lens distortions can be remedied by purchasing better (albeit more expensive) optics. Although it is possible to correct imaging geometry, this is difficult to cope with in practice because it involves moving cameras to optimal viewing locations. Such locations may be inconvenient (e. g., requiring significantly longer cable runs) or not feasible (e.g., above the ceiling level).
The solutions to other problems are not as straightforward. When the camera shakes due to wind or other vibration, for example, the current image acquired by the camera will not exactly line-up with a previously captured reference image. This leads to detection of image changes (particularly near edges or in textured regions) that are not due to independent objects. Stabilizing the images produced by such surveillance cameras eliminates these artificial detections.
Stabilization can be accomplished by mechanically moving the camera in response to inertial measurements, or by altering portions of the optical path (e.g., sliding prisms) in response to similar error signals. However these solutions require changing the cameras that are already installed. Also, these solutions are typically bulkier than an ordinary fixed camera and hence may be difficult to install in some locations. Stabilization may also be performed electronically (as in some camcorders) by shifting the pixel read positions on a digital image sensor. However, these pixel shifts are typically integer pixel shifts that are not accurate enough to remove all the artifacts generated by background subtraction. Another option is to use image warping based on optical flow analysis. However, this analysis is mathematically complicated thus necessitating either a lower video frame rate or a more expensive computation engine.
Many cameras have built-in circuitry or algorithms for automatic gain control (AGC) and automatic white balance (AWB). These mechanisms typically generate video images that are more pleasing to the human eye. Unfortunately, these corrections can impair machine analysis of the images because there are frame to frame variations that are not due to any true variation in the imaged environment. Background subtraction is particularly affected by this phenomenon that can cause large portions of the image to be falsely declared as foreground. Some cameras allow AGC and AWB to be disabled, however, this may not be true for all (possibly legacy) cameras in a video surveillance system. Also, it is sometimes desired to analyze previously recorded material where the source camera and its parameters can not be controlled retroactively. While it is possible to correct exposure and color balance using techniques such as histogram stretching or contrast stretching, these whole-image methods can be confused if the content of the scene changes.
Furthermore, when using legacy analog video transmission format RS-170, the color of a pixel is encoded as a phase-shifted chrominance signal riding on top of the standard amplitude modulated intensity signal. Unfortunately, when separating these two signals to reconstruct the image representation, sharp changes in the intensity signal can be interpreted as color shifts. This can happen due to inadequate band limiting of the intensity signal at the source, poor “comb” filtering at the receiver, or nonlinear dispersion in the transmission medium (typically coax cable). This aliasing results in strobing color rainbow patterns around sharp edges. This can be disadvantageous for computer vision systems that need to know the true colors of regions, or for object detection and tracking systems based on background subtraction which may erroneously interpret these color fluctuations as moving objects.
The impact of these color artifacts can be diminished by converting the image to monochrome (i.e., a black and white image) so that there are no color shifts, only smaller intensity variations. However, this processing removes potentially valuable information from the image. For instance, in a surveillance system it is useful to be able to discern the colors of different vehicles, something not possible in a gray-scale video. Another approach is to apply aggressive spatial smoothing to the image so that the “proper” adjacent colors dominate in the problem areas. However, this approach is sub-optimal in that the boundaries of objects (and sometimes even their identities) can be obscured by such blurring. Still another method would be to attempt to reconstruct the original two-part analog signal and then employ a more sophisticated chrominance-luminance separation filter. Unfortunately, many times video has been subject to a lossy compression method, such as MPEG (especially if it has been digitally recorded), in which case the exact details of the original waveform cannot be recovered with sufficient fidelity to permit this re-processing.
A further problem is that video images often contain “noise” that is annoying to humans and can be even more detrimental to automated analysis systems. This noise comes primarily from three sources: imager noise (e.g., pixel variations), channel noise (e.g., interference in cabling), and compression noise (e.g., MPEG “mosquitoes”). Effective removal or suppression of these types of noise leads to more pleasing visuals and more accurate computer vision systems. One standard method for noise removal is spatial blurring, which replaces a pixel by a weighted sum of its neighbors. Unfortunately, this tends to wash out sharp edges and obscure region textures. Median-based filtering attempts to preserve sharp edges, but still corrupts texture (which is interpreted as noise) and leads to artificially “flat” looking images. Another method, temporal smoothing, uses a weighted sum of pixels from multiple frames over time. This works well for largely stationary images, but moving objects often appear ghostly and leave trails behind.
Yet another difficulty is that background subtraction operates by comparing the current image with a reference image and highlights any pixel changes. Unfortunately, while often the desired result is the delineation of a number of physical objects, shadow legions are typically also marked because the scene looks different here as well. Eliminating or suppressing shadow artifacts is desirable because it allows better tracking and classification of a detected object (i.e., its forms varies less over time and does not depend on lighting conditions).
One way to eliminate shadows is to first perform basic background subtraction and then to more closely examine the pixels flagged as foreground. For example, the hue, saturation, and intensity can be computed separately from the foreground pixel and the corresponding background pixel. If the hue and saturation measures are a close match, the intensities are then examined to see if they are within a plausible range of variations. If so, the pixel is declared a shadow artifact and removed from the computed foreground mask. Unfortunately, this method requires the computation of hue, which is typically expensive because it involves trigonometric operators. Moreover, hue is unstable in regions of low saturation or intensity (e.g., shadows). Finally, the derived hue is very sensitive to the noise in each color channel (the more noise, the less reliable the estimate).
A need therefore exists for improved techniques for visual background subtraction. A further need exists for methods and apparatus for visual background subtraction that address each of the above-identified problems using one or more software preprocessing modules.