Images are typically denoised using noise models, or according to images classes. All those methods are based on certain assumptions about the noise model, or the image signal to remove noise. One of the most widely used assumptions is the sparsity of the signal in a transform domain.
An image is sparse in the transform domain when most magnitudes of transform domain coefficients are either zero, or negligible. In that case, the image can be well approximated as a linear combination of a small number of bases that correspond to pixel-wise consistent patterns. Denoised image can be obtained by keeping only transform coefficients larger than a first threshold, which are mainly due to the original signal, and discarding coefficients smaller than a second threshold, which are mainly due to noise.
The sparsity level of an image in the transform domain heavily depends on both the signal and the noise properties. The selection of a good sparsity inducing transform is an art, and is effectively a function of the underlying, signal to be denoised, and the noise. For example, multi-resolution transforms achieve good sparsity for spatially localized details, such as edges and singularities. Because most images are typically full of such details, transform domain methods have been successfully applied for image denoising.
Conventional transform representations using, e.g., a discrete cosine transform (DCT) or wavelets, are advantageous for their computational simplicity, and provide a sparse representation for signals that are smooth, or have localized singularities, respectively. Therefore, conventional orthogonal transforms can provide sparse representation only for a particular class of signals. For all other classes of signals, it is now known that representations learned for a specific class yields sparser representations. Over-completeness provides extra degree of freedom to represent the original signal, and further increase, the sparsity in transform domain.
Dictionary learning provides a way to learn sparse representations for a given class of signals. Non-local means (NLM) de-noising is based on non-local averaging, of all the pixels in an image. The amount of weighting for a pixel is based on a similarity of a small patch of pixels, and another patch of pixels centered on the pixel being dc-noised.
In terms of a peak signal-to-noise ratio, PSNR, block matching in 3D (BM3D) approaches optimal results for constant variance noise, but cannot be improved beyond 0.1 dB values, BM3D is a two-step process. The first step gives an early version of the denoised image by processing stacks of image blocks constructed by block matching. The second stage applies a statistical filter in a similar manner. For a reference block, pixel-wise similar blocks are searched and arranged in a 3D stack. Then, an orthogonal transform is applied to the stack, and the noise is reduced by thresholding the transform coefficients, followed by an inverse transform. Sparsity is enhanced due to similarity between the 2D blocks in the 3D stack. After an estimate of the denoised image is obtained, the second step finds the locations of the blocks similar to the processed block, and forms two groups, one from the noisy image and other from the estimate. Then, the orthogonal transform is applied again to both the groups and Wiener filtering is applied, on the noisy group using an energy spectrum of the estimate as the true energy spectrum.
Most methods for dictionary learning, and almost all methods for denoising including the non-local means and the BM3D, assume that the signal is corrupted by stationary noise. This is valid for most conventional imaging methods. However, for range, depth, radar, and synthetic aperture radar (SAR), this assumption is invalid. For example, when measuring depth directly with light-based range scanners, noise varies locally due to different reflection of scanner light pulses near transparent or reflective surfaces, or near boundaries. Similarly, the variance of speckle noise in radar imaging due to random fluctuations from an object that is smaller than a single pixel varies significantly from pixel to pixel.
U.S. patent application Ser. No. 13/330,795, “Image Filtering by Sparse Reconstruction on Affinity Net,” filed by Assignee, describes a method for reducing multiplicative and additive noise in image pixels by clustering similar patches of the pixels into clusters. The clusters form nodes in an affinity net of nodes and vertices. From each cluster, a dictionary is learned by a sparse combination of corresponding atoms in the dictionaries. The patches are aggregated collaboratively using the dictionaries to construct a denoised image.