Deep networks are used in many computer vision and image processing applications. A typical deep network with fully-connected layers or convolutional layers works well for a wide range of applications. However, that general architecture does not use problem domain knowledge, which could be very helpful in some applications.
For example, in the case of image denoising, conventional multilayer perceptrons (feedforward neural networks) are not very good at handling multiple levels of input noise. When a single multilayer perceptron is trained to handle multiple input noise levels, by providing the noise variance as an additional input to the network, it produced inferior results compared to the state-of-the-art Block-matching and 3D filtering (BM3D), see Dabov et al., “Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering,” IEEE Transactions on Image Processing, 16(8):2080-2095, 2007. In contrast to this, an Expected Patch Log Likelihood (EPLL) framework, which is a model-based approach, works well across a wide range of noise levels, see Zoran et al., “From Learning Models of Natural Image Patches to Whole Image Restoration,” In ICCV, 2011.
Gaussian Markov Random Fields (GMRFs) are often used in image inference tasks, such as denoising, inpainting, super-resolution, depth estimation. GMRFs model continuous quantities and can be efficiently solved using linear algebra. However, the performance of a GMRF model depends heavily on the choice of a prior probability distribution (prior). For example, in the case of image denoising, a homogeneous prior, i.e., an identical prior for each pixel, results in blurred edges and over-smoothing of the images. Hence, to successfully use the GMRF model, the prior should be selected according to the image being processed. A GMRF model that uses a data-dependent prior is referred to as Gaussian conditional random field (GCRF).
Using GCRF model for an image inference task involves two main steps:
1) a data-dependent prior generation step in which an appropriate image prior is selected based on the input image; and
2) an inference step in which a Maximum a Posteriori (MAP) inference is performed with the selected image prior.
Gaussian Conditional Random Fields:
The GCRFmodel described by Tappen et al., see “Learning Gaussian Conditional Random Fields for Low-Level Vision,” CVPR, 2007, models the parameters of a conditional distribution of the output image as a function of the input image. A precision matrix associated with each image patch, e.g., 3×3 pixels, is modeled as a linear combination of hand-selected derivative filter-based matrices. The combination weights are selected as a parametric function of the absolute responses of the input image to a set of predefined multi-scale oriented edge and bar filters, and the parameters are learned using discriminative training.
The GCRF model has been extended to Regression Tree Fields (RTFs), see Jancsary et al., “Loss-specific Training of Non-parametric Image Restoration Models: A New State of the Art,” ECCV, 2012, where regression trees are used for parameter selection. A full-image model is decomposed into several overlapping patch models, and the regression trees are constructed for selecting parameters of the Gaussian models defined over the patches. The regression trees use responses of input image to various hand-chosen filters for selecting an appropriate leaf node for each image patch. More recently, a cascade of RTFs has been used for non-blind image deblurring, see Schmidt et al., “Discriminative Non-blind Deblurring,” CVPR, 2013.
Denoising
Image denoising is a fundamental problem in image processing. There are many methods that can be used for denoising, including shrinkage, sparse coding with non-local image statistics, natural image priors, and graphical models.
Denoising with Neural Networks:
Various deep network based approaches are known for image denoising, such as stacked sparse denoising autoencoders (SSDA), and multilayer perceptrons, see Burger et al., “Image Denoising: Can Plain Neural Networks Compete with BM3D?” CVPR, 2012. However, none of those deep networks explicitly model the variance of the noise, and hence are not good at handling multiple noise levels. In all the above networks, a different network is used for each noise level, which complicates the design and process.