Camera shake, in which an unsteady camera causes blurry photographs, is a chronic problem for photographers. The explosion of consumer digital photography has made camera shake very prominent, particularly with the popularity of small, high-resolution cameras whose light weight can make them difficult to hold sufficiently steady. Many photographs capture ephemeral moments that cannot be recaptured under controlled conditions or repeated with different camera settings—if camera shake occurs for any reason, then that moment is “lost” instead of captured in the resulting image.
Shake can be mitigated by using faster exposures, but that can lead to other problems such as sensor noise or a smaller than desired depth of field. A tripod, or other specialized hardware, can eliminate camera shake, but these are bulky and most consumer photographs are taken with a conventional, handheld camera. Users may avoid the use of flash due to the unnatural tonescales that result. Many of the otherwise favorite photographs of amateur photographers are spoiled by camera shake. A method to remove that motion blur from a captured photograph would be an important asset for digital photography.
Camera shake can be modeled as a blur kernel, describing the camera motion during exposure, convolved with the image intensities. Thus, the task of deblurring an image is image deconvolution. Removing an unknown camera shake (where the blur kernel is not known) is a form of blind image deconvolution, which is a problem with a long history in the image and signal processing literature. For a survey on the literature in this area, see Kundur, D. and D. Hatzinakos, “Blind Image Deconvolution, IEEE Signal Processing Magazine 13(3)43-64, May 1996. Existing blind deconvolution methods typically assume that the blur kernel has a simple parametric form, such as a Gaussian or low-frequency Fourier components. However, the blur kernels induced during camera shake typically do not have simple forms, and often contain very sharp edges. Similar low-frequency assumptions are typically made for the input image, e.g., applying a quadric regularization. Such assumptions can prevent high frequencies (such as edges) from appearing in the reconstruction. Caron et al. in “Noniterative blind data restoration by use of an extracted filter function,” Applied Optics 41(32): 6884-6889, November 2002, assume a power-law distribution on the image frequencies. Power-laws are a simple form of natural image statistics that do not preserve local structure. Some methods (Jalobeanu, A. et al., “Estimation of blur and noise parameters in remote sensing,” in Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, 2002 and Neelamani, R. et al., “Forward: Fourier-wavelet regularized deconvolution for ill-conditioned systems, IEEE Trans. on Signal Processing 52:418-433, February 2004) combine power-laws with wavelet domain constraints but do not work for more complex blur kernels. To address this shortcoming, some of these methods attempt to manipulate the image in the Fourier domain to obey power-law constraints before using the wavelet domain to impose image structure constraints.
Deconvolution methods have been developed for astronomical images (Gull, S., “Bayesian inductive inference and maximum entropy”, in Maximum Entropy and Bayesian Methods,. G. J. Erikson and C. R. Smith (eds.) Kluwer Academic Publishers, 53-74, 1998; Richardson, W, “Bayesian-based iterative method of image restoration,” Journal of the Optical Society of America, A 62(1):55-59, 1972; Tsumuraya, F. et al., “Iterative blind deconvolution method using Lucy's algorithm, Astronony and Astrophysics 282(2):699-708 February 1994; Zarowin, C., “Robust, noniterative, and computationally efficient modification of van Cittert deconvolution optical figuring,” Journal of the Optical Society of America A 11(10):2571-83, October 1994), which have statistics quite different from the natural scenes desired in digital photography. Performing blind deconvolution in the astronomy domain is usually straightforward, as the blurry image of an isolated star reveals the point-spread-function.
Another approach is to assume that there are multiple images available of the same scene (Bascle, B. et al., “Motion Deblurring and Superresolution from an Image Sequence,” in European Conference on Computer Vision(2):573-582 (1996); and Rav-Acha, A. and S. Peleg, “Two motion-blurred images are better than one,” Pattern Recognition Letters, pp. 311-317, 2005). Hardware approaches include: optically stabilized lenses (Canon, Inc., “What is optical image stabilizer?” at www.canon.com/bctv/faq/optis.html 2006), specially designed CMOS sensors (Liu, X. and A. Gamal, “Simultaneous image formation and motion blur restoration via multiple capture,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Vol. 3, 1841-1844, 2001) and hybrid imaging systems (Ben-Ezra, M. and S. K. Nayar, “Motion-Based Motion Deblurring”, IEEE Trans. on Pattern Analysis and Machine Intelligence 26(6): 689-698, 2004). However it is desirous to have a solution that works with existing cameras and imagery and that works for as many situations as possible, and thus does not assume any such hardware or extra imagery is available.
Recent work in computer vision has shown the usefulness of heavy-tailed natural image priors in a variety of applications, including denoising (Roth, S. and M. J. Black, “Fields of Experts: A Framework for Learning Image Priors, in CVPR (Computer Vision and Pattern Recognition), Vol. 2, pp. 860-867, 2005); superresolution (Tappen, M. F. et al., “Exploiting the sparse derivative prior for super-resolution and image demosaicing,” 3rd Intl. Workshop on Statistical and Computational Theories of Vision (associated with Intl. Conf. on Computer Vision (2003); intrinsic images (Weiss, Y., “Deriving intrinsic images from image sequences,” in ICCV (International Conference on Computer Vision), pp. 68-75, 2001; video matting (Apostoloff, N. and A. Fitzgibbon, “Bayesian image matting using learnt image priors,” in “Proceedings of Conference on Computer Vision and Pattern Recognition, pp. 407-414, 2005; inpainting (Levin A. et al., “Learning How to Inpaint from Global Image Statistics,” in ICCV, pp. 305-312, 2003; and separating reflections (Levin, A. and Y. Weiss, “User Assisted Separation of Reflections from a Single Image Using a Sparsity Prior,” in ICCV, Vol. 1, pp. 602-613, 2004). Each of these methods is effectively “non-blind” in that the image formation process (e.g., the blur kernel in superresolution) is assumed to be known in advance.
Accordingly, the camera shake problem is underconstrained: there are simply more unknowns (the original image and the blur kernel) than measurements (the observed image). Hence, all practical solutions must make strong prior assumptions about the blur kernel, about the image to be recovered, or both. Traditional signal processing formulations of the problem usually make only very general assumptions in the form of frequency-domain power laws; the resulting algorithms can typically handle only very small blurs and not the complicated blur kernels often associated with camera shake. Furthermore, algorithms exploiting image priors specified in the frequency domain may not preserve important spatial-domain structures such as edges.