Photon and sensor noise limit the performance of all imaging systems. Minimizing the effects of this noise is a universal and fundamental image processing task. Here, inter alia, this addresses the problem of denoising in still digital camera images, using a new approach that combines measurements of natural image statistics with measurements of the noise characteristics of digital cameras.
In general, a noisy image can be represented as an unknown “true” image that has been corrupted by noise. Let z(x) represent the value of a pixel at location x=(x,y) in the true image. Without loss of generality, the observed value is given by z0(x)=z(x)+n(x, z), wherein n(x, z) is the noise, which may be spatially correlated and/or dependent on the true image values z.
The goal of denoising is to estimate z(x) given the observed context of pixel values at and around the pixel location c(x).
Conceptually, the optimal estimate is given by the standard formula from Bayesian statistical decision theory:
                              ⁢                      (            x            )                          =                              argmin                                          z                ^                            ⁡                              (                x                )                                              ⁢                                    ∑                              z                ⁡                                  (                  x                  )                                                      ⁢                                                  ⁢                                          γ                ⁡                                  [                                                            z                      ⁡                                              (                        x                        )                                                              ,                                                                  z                        ^                                            ⁡                                              (                        x                        )                                                                              ]                                            ⁢                              p                ⁡                                  [                                                            z                      ⁡                                              (                        x                        )                                                              ❘                                          c                      ⁡                                              (                        x                        )                                                                              ]                                                                                        (        1        )            where γ[z(x), {circumflex over (z)}(x)] is the cost function, and p[z(x)|c(x)] is the posterior probability of the true value given the observed context.
A vast number of different denoising methods have been proposed over the past several decades (for recent summaries see, for example, Buades A., Coll B. & Morel J. M. (2010) Image Denoising Methods: A new non-local method, SIAM Review. 52, 113-147; P. Chatterjee & P. Milanfar (2010) Is Denoising Dead?, IEEE Trans. on Image Processing. 19, 895-911). They can all be viewed as providing some form of sub-optimal approximation to the Bayes optimal estimate given by equation (1).
Most often, the explicit (or implicit) cost function is the squared error between the estimated and true pixel values γ[z(x), {circumflex over (z)}(x)]=[z(x)−{circumflex over (z)}(x)]2. This cost function finds the estimate with the minimum mean squared error (MMSE) or equivalently the estimate with the maximum peak signal-to-noise ratio (PSNR). Other cost functions, such as those that are based on perceptual properties of the human visual system (see, e.g., Wang Z., Bovik A. C., Sheikh H. R., & Simoncelli E. P. (2004), Image quality assessment: From error visibility to structural similarity, IEEE Trans. on Image Processing), are worthy of consideration; however, as is common in the denoising literature, the present focuses on the squared-error cost function. For this cost function, equation (1) becomes:{circumflex over (z)}opt(x)=E[z(x)|c(x)]  (2)
In other words, the Bayes optimal estimate is simply the expected value of the true pixel value given the observed context (see, e.g., Bishop, C. M. (2006), Pattern recognition and machine learning; New York: Springer).
In order to develop an optimal denoising method for a specific application, one can characterize both the signal (the statistical structure of true images) and the noise (the statistical structure of the noise). The various denoising methods can be distinguished based on assumptions they make about the structure of the signal and noise. Also important is the computational efficiency (speed and complexity). For a given application, the best method will be the one that jointly maximizes the approximation to equation (2) and the computational efficiency.
The earliest principled denoising method is known as the Wiener filter (Wiener, N. (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series. New York: Wiley), which is an exact implementation of equation (2), under the assumption that both the signal and the noise are described by stationary (not necessarily white) Gaussian processes. However, images are generally non-stationary and hence this method does not produce good results for most images (it blurs edges and texture). Subsequently, there have been many attempts to weaken the assumption of global stationarity. Adaptive Wiener filtering methods assume Gaussian noise and signal that is locally stationary; the methods estimate the Gaussian parameters at each pixel location and then apply the Wiener filter with those parameters (e.g., D. T. Kuan, A. A. Sawchuk, T. C. Strand, and P. Chavel (1985) Adaptive noise smoothing filter for images with signal dependent noise, IEEE Trans. PAMI, vol. 7, pp. 165-177). A closely related approach combines image segmentation and Bayesian MAP estimation (Liu C., Szeliski R., Kang, S. B., Zitnick C. L. & Freeman W. T. (2008) Automatic estimation and removal of noise from a single image. IEEE Trans. Pattern Anal. & Mach. Intell. 30, 299-314). The critical component of these methods is estimating the local Gaussian parameters; the less noisy the estimated parameters the more accurate the denoising. Simple non-iterative methods for estimating the parameters that use only pixels in the immediate neighborhood of the pixel being denoised can be computationally efficient.
Other recent methods do not make explicit formal assumptions about the structure of the noise or signal, but instead exploit heuristic intuitions to average out the noise and leave the signal. One simple and effective method of this type is bilateral filtering (Tomasi C. & Manduchi R. (1998) Bilateral filtering for gray and color images. Proceedings IEEE Conference in Computer Vision, Bombay, India), which takes the weighted average of pixels in the local neighborhood, where the weights depend jointly on the spatial and gray-level (color) distance of the neighboring pixel from the pixel being denoised. The intuition is that spatially nearby pixels are positively correlated in gray level and can be averaged, but spatially nearby pixels that differ substantially in gray level usually contain strong signals (true image features) and should not be averaged. This method can be computationally efficient.
Related methods are those based on non-local averaging (A. Efros and T. Leung (1999) Texture synthesis by non-parametric sampling, Proceedings of the IEEE International Conference on Computer Vision, 2, Corfu, Greece, 1033-1038). For example, the NL-Means algorithm (Buades A., Coll B. & Morel J. M. (2010) Image denoising methods: A new non-local method. SIAM Review. 52, 113-147) searches for pixels whose local neighborhood in the image is similar to the neighborhood of the pixel being denoised. It then averages all these pixels to obtain the estimate. The more similar is the local neighborhood the greater is the weight given to the pixel when computing the average. The intuition is that natural images are statistically regular and hence if two image patches are similar in structure it is likely that the center pixels are similar and hence can be averaged to estimate the true image value. To the extent that this assumption is valid for the kind of noise in an imaging system and for the kinds of images being captured, such averaging could provide a good approximation to the right side of equation (2). Indeed, methods based on non-local averaging provide good results and are currently popular. However, these methods are less computationally efficient because of the need to make the neighborhood similarity measurements.
Another class of methods involves hard or soft thresholding following a linear transform, such as a wavelet or discrete cosine transform (R. R. Coifman and D. Donoho (1995) Translation-invariant de-noising, in Wavelets and Statistics, Lecture Notes in Statist., Springer-Verlag, New York, pp. 125-150; J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli (2003) Image Denoising Using Scale Mixtures of Gaussians in the Wavelet Domain, IEEE Trans. Image Processing, 12, pp. 1338-1351). The intuition is that for appropriately chosen kernel shapes, the regular structure of natural images results in a very sparse representation (a few large kernel coefficients, with most near zero), whereas the much more random structure of noise results in a less sparse representation (many coefficients with modest values). Thus, thresholding out the smaller coefficients selectively removes the noise. These methods can be computationally efficient, but can be prone to producing ringing artifacts.
Currently denoising methods include hybrid methods (K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian (2007). Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans. Image Process., 16, pp. 2080-2095; L. Zhang, W. Dong, D. Zhang & G. Shi (2010); Two-stage denoising by principle components analysis with pixel grouping, Pattern Recognition, 43, 1531-1549). For example the BM3D method combines non-local averaging, cooperative linear transform thresholding, and Wiener filtering (K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian (2007); Image denoising by sparse 3D transform-domain collaborative filtering; IEEE Trans. Image Process., 16, pp. 2080-2095).
In summary, most of the existing methods either assume Gaussian image and noise models, or principled heuristics based on qualitative properties of natural images. Further, the parameters of most denoising methods are estimated from the image being denoised. Thus, there is a need in the art for improved denoising methods.