Video acquired under low light condition is often noisy and of poor visual clarity. A much higher bitrate is required to store these noisy videos than for storing noise-free video. Hence, application of noise filtering is a fundamental low level preprocessing step for any video acquired under low light. The noise filtering also results in lower false alarm rate for video analytics and motion triggered recording. It has been observed that even under normal illumination condition wherein noise may not be visually apparent; application of noise reduction filter brings in significant savings in bitrate.
Noise is a component of any digital video, however, noise is more apparent in low light conditions over homogeneous regions. Even under constant illumination, the number of photons incident on a CCD or CMOS sensor at any given exposure interval is stochastic in nature. This inherent randomness in incident photon gives rise to photon noise. Other sources of noise in surveillance video are thermal noise or dark current, defective sensing elements, quantization error, compression error and transmission error. At high temperature, additional free electrons are released from the CCD that contaminates the count of true photoelectrons giving rise to dark current. Use of high gain (high sensitivity mode) under low light conditions drives noisy pixels to saturation leading to what is known as “white-tailed noise”. Each of these types of noise follows a different distribution. Several approaches are taken to improve the image quality under low light conditions. Notable techniques for improving image quality under low light conditions are discussed below.
Digital Slow Shutter (DSS), wherein the sensor is allowed to integrate for longer duration, and a frame buffer outputs the most recent complete frame while the next one is being integrated). A problem with this approach is that motion areas in the picture (which can often be of greater interest than the static background) appear blurred as a result of the long integration time.
Digital Noise Reduction wherein the sensor continues to operate at a specified frame rate and frame averaging is done in a dedicated hardware to provide a continuous output image, along with change or motion detection to make use of the current image wherever there is motion. Any motion regions therefore tend to be noisy, but the blur is reduced to that which would normally occur without any such motion-sensitive integration.
Digital noise filtering is an active area of research covered by a large number of patents and publications. A wide range of techniques are used for filtering of image noise. The most prominent among them are Markov Random Field (MRF), wavelet shrinkage, anisotropic diffusion, bilateral filtering and total variation minimization.
Most video noise reduction algorithms need to assume a particular noise model. While thermal noise follows an additive white noise model, the photon noise is intensity dependent. A first step in a generic video noise reduction approach is noise estimation. This is required to decide whether to apply a noise removal algorithm and to select appropriate parameters for noise reduction filtering. Excessive filtering results in loss of image structure, whereas minimal filtering leaves behind traces of residual noise. Noise can be estimated using a single frame (intra-frame analysis) or through a sequence of frames (inter-frame analysis) or a combination of both. Spatial and temporal redundancies are the keys to digital video noise removal.
In the spatial domain, noise can be estimated using local variances in small image blocks over homogeneous regions. To estimate noise, median absolute deviation (MAD) can be used as a substitute in place of a computation intensive variance measure. Higher order moments for noise estimation, which, of course, is more computation intensive are sometimes advocated. A more computationally efficient approach uses local gradients in place of variance. Alternatively, one can use wavelet-based analysis wherein noise is estimated through investigation of sub-band energy. In yet another approach, a difference between a smoothed image and the original image is computed, and the threshold difference is used as a measure of image noise.
Inter-frame noise analysis techniques use temporal variances over motionless areas as an estimate for noise. For black and white images, noise is modeled for the intensity channel alone. On color images, noise modeling is carried out separately for each of the color channels. Two approaches for estimating noise from a single video frame while ignoring image details are described in U.S. Pat. No. 7,046,307 issued to Hui and in U.S. Publication No. 2006/0103765 (Zhou et al.). Both these references consider an additive Gaussian noise model. A technique for noise estimation from a sequence of frames is adopted in U.S. Pat. No. 6,546,149 issued to Ruggiero et al. A recursive approach for noise estimation in temporal domain is followed in U.S. Publication No. 2005/0107982 (Sun et al.).
A video can be viewed as a 3D data set comprising of two spatial dimensions and one temporal dimension. Like noise estimation, noise removal can be carried out in a spatial (pixel) dimension, in a temporal dimension, or in both dimensions. The last one can be a separable implementation, i.e. separate noise filtering in spatial and temporal dimensions or a complete 3D filtering. In case of a separable 3D filtering, the options are whether to apply it first in temporal dimension or spatial dimension. A noise removal filter for video data should make use of both temporal and spatial dimensions. It can be done in original pixel) spatial domain or in a transformed domain. Filtering of individual images can be carried out using 2D local filter. Reducing noise in a video or image sequence requires 3D spatial filters to get best results. A 3D filter makes use of local correlation in both spatial and temporal dimensions along with high frequency spectrum of image noise. One of the challenges in local filtering for noise removal is removing image noise while preserving the image structures. The commonly used local filters are local average (mean) filter, median filter, k-NN filter, sigma filter, Lee filter, Gamma filter, Frost filter etc. The underlying concept is to replace a pixel by the most representative 3D neighbor that preserves image structures and foreground objects. Filtering of motion regions is carried out using local statistics, noise estimates and a threshold measure. Such filters generally do a reasonable job at the cost of blurring.
Image noise reduction by edge-preserving image smoothing (low-pass filtering) is described in U.S. Publication No. 2005/0135699 (Anderson) and U.S. Publication No. 2005/0025382 (Oizumi et al.). The methods disclosed in Anderson and Oizumi et al. attempt to identify a local edge and preserve it by applying a low pass filter in the edge direction. However these methods are computationally intensive.
A motion sensitive temporal filtering for video noise reduction is suggested in U.S. Publication No. 2006/0044475 (Lin et al.). Separable implementations of noise reduction in spatial and temporal domains are adopted in U.S. Publication No. 2006/0139494 (Zhou et al.), U.S. Publication No. 2006/0158562 (Rhee), and U.S. Pat. No. 7,199,838 issued to Lin et al. These references make use of motion detection and spatial as well as temporal filtering. The final output is either a selection or combination of multiple spatial and temporal filtered outputs.
Adaptive filters allow accounting for motion without explicit motion detection. Recursive median, recursive mean and other regressive models belong to this category. These regressive models are preferred over filtering techniques that make use of explicit motion detection due to their computational efficiency and robustness to motion blur in high noise conditions. Motion compensated filters compute local motion regions using block matching, optical flow or other explicit motion detection. Such filters use 3D blocks for filtering that is fully compensated for motion. They are found to be the best in moderate noise conditions.
Noise removal can also be performed in a number of transformed domains that includes Fourier, wavelet, curvelet, PCA, ICA, etc. Among all these methods wavelet domain analysis has been the most successful one. The wavelet-based method applies wavelet transform to carry out local scale analysis and then associates larger wavelet coefficients with signal and smaller ones with noise. The most commonly used wavelet noise reduction technique uses a wave shrink (wavelet coefficient shrinking) approach in which the wavelet coefficients are manipulated in some way or the other followed by an inverse transform, as described in U.S. Publication No. 2004/0008904 (Lin et al.) and U.S. Pat. No. 6,801,672 issued to Thomas.
Wavelet shrinkage can be hard (removing all coefficients below a threshold), soft (reducing all coefficients by a proportionality factor), non-linear (modification of coefficients using a nonlinear function), spatially adaptive or any combination of these. A 3D wavelet analysis supporting spatio-temporal filtering is also advocated. However, it is common to adopt a recursive motion sensitive filtering on wavelet coefficients.