Many imaging devices that acquire or process digital images introduce artifacts in the processing pipeline. These artifacts include: additive noise, image blurring, compression artifacts, missing pixels, geometric distortions, etc. Image restoration is an attempt to reduce such artifacts using post-processing operations. One important field within image restoration deals with image denoising.
A noisy image can be expressed mathematically as follows:y=x+n,   (1.1)where y is the observed image, x is the unknown original image and n is contaminating noise (all in vector notation). The goal in image restoration is to reconstruct the original image x given the noisy measurement y. This problem is a typical instance of an inverse problem, and the conventional solutions typically consider prior knowledge regarding the distribution of x.
A common approach for modeling the statistical prior of natural images is to estimate their statistical distribution in a transform domain. This is usually implemented using some type of wavelet transform. The main motivation for this approach stems from the fact that the wavelet transform of natural images tends to de-correlate pixel values. Hence, it is possible to make a reasonable inference on the joint distribution of the wavelet coefficients from their marginal distributions. When dealing with image denoising, this leads to a family of classical techniques known as the wavelet shrinkage methods. These techniques amount to modifying the coefficients in the transform domain using a set of scalar mapping functions, {Mi}, called shrinkage functions (SF).
The shrinkage approach first performs a wavelet transform:yw=Wyfollowed by a correction step in which the wavelet coefficients are rectified according to a set of shrinkage functions (SFs):{circumflex over (x)}w={right arrow over (M)}w{yw}where {right arrow over (M)}w=[Mw1,Mw2, . . . ] is a vector of scalar mapping functions. The denoised image then is obtained after applying the inverse transform to the modified coefficients:{circumflex over (x)}=W−1{circumflex over (x)}w 
Due to their simplicity and good results, shrinkage approaches have received a great deal of attention over the last decade. Hundreds of shrinkage methods have been proposed, differing mostly in the type of transform used and in the form in which the SFs are applied. The justification for applying a marginal (scalar) SF to each coefficient independently can be shown to emerge from the independence assumption of the wavelet coefficients. This assumption was postulated in the early studies in which SFs were applied to unitary transforms.
Various efforts have been made to improve the denoising results of shrinkage methods. Such efforts generally concentrate on two main approaches. The first approach attempts to improve the results by abandoning the unitary representation and working in over-complete transform domains. Such transforms include the un-decimated wavelets, steerable wavelets, ridgelets, contourlets, and curvelets. Although the independence assumption cannot be justified in the over-complete domain, most of the conventional methods naively borrow the traditional SFs from the unitary case.
The second approach toward improvement relaxed the independence assumption of the wavelet coefficients and concentrated on modeling the statistical dependencies between neighboring coefficients. This scheme can be seen as diverging from the scalar SFs to multivariate SFs where transform coefficients are rectified according to a group of measured coefficients. Inter-coefficient dependencies are exploited, using any of a range of techniques such as: joint sparsity assumption, HMM and Bayesian models, context modeling, tree models representing parent-child dependencies, co-occurrence matrix, adaptive thresholding, and more. These types of techniques sometimes achieve good denoising performance. However, they generally lack the efficiency and simplicity of the classical shrinkage approaches.
Common to all the conventional techniques for generating SFs, regardless of the approach used, is that the SFs are derived in a descriptive manner. Namely, a statistical model is first constructed describing the statistical prior of the transform coefficients. Based on this prior, a set of SFs are derived (scalar or multivariate, parametric or non-parametric) designed to rectify the contaminated coefficients. Clearly, imprecise modeling of the statistical prior leads directly to a deterioration in the resulting performance. Because inter-coefficient dependencies are complicated to model, in particular in the over-complete case, it is expected that the statistical models are far from being precise. In fact, due to the high dimensionality of the joint probability, ad-hoc assumptions commonly have been made in order to make the problem tractable. Such assumptions include, e.g., ignoring the inter-coefficient dependencies, modeling only bivariate or parent-child dependencies, and modeling the joint dependencies of a small group of neighboring coefficients while assuming simplified parametric models.