1. Technical Field
A “Blur Remover” constructs one or more deblurred images, including panoramic or mosaic images, from a sequence of motion-blurred images such as a video sequence of a scene obtained using conventional digital video capture devices, and in particular, the Blur Remover provides various techniques for using joint global motion estimation and multi-frame deblurring with optional automatic video duty cycle estimation to construct deblurred images for use in a variety of applications.
2. Background
Image deblurring is a widely studied problem in computer vision, with a broad range of real-world applications. Removing blur is, in general, an ill-posed problem due to the loss of information caused by blurring. The most common types of blur are motion blur and defocus blur. Defocus blur is usually isotropic and its point spread function (PSF) is often modeled approximately as a Gaussian. Motion blur, on the other hand, is usually more complex due to the dynamics of the scenes and camera motion.
For example, one type of conventional image deblurring assumes that blur is entirely due to motion, i.e. there is no defocus blur caused by an out of focus lens, and that a “blur kernel” can be parameterized by a 1-D motion vector. For example, one such technique derives a local motion blur constraint and extends it to the a channel. Since the method is based on local constraints, it can be used to estimate non-parametric motion blur as well as rigid and spatially invariant motion blur. However, there are several limitations. First, the method requires that the values of the α channel can be determined accurately, which is a challenging task even for sharp images. Second, the derivation of the α-motion blur constraint assumes that the image is constant within any neighborhood of the same size as the blur kernel. This assumption is reasonable as long as the image is smooth and the spatial extent of the blur kernel is small. However if the image is highly textured or the blur kernel is large, the constraint will become invalid. Finally, this particular technique does not make use of multiple images.
A related technique addresses the problem of simultaneous tracking and deblurring in motion blurred image sequences. This technique is based on the observation that successive motion blur operations are commutative. Hence, for two sequential blurred images, blurring the first image using the blur kernel of the second image should produce the same result as blurring the second image using the blur kernel of the first image, up to a transform due to motion. One advantage of this approach is that it allows one to solve for motion with performing deblurring, hence making tracking easier. Unfortunately, if the objective is to deblur the images, the blur kernels estimated in this way are only relative. In particular, they satisfy a blur constraint but are not necessarily the actual kernels that produced the input images. Therefore, this technique requires the introduction of additional regularization and some image prior. Further, this technique seems to assume that the blur kernel can be modeled as a 1-dimensional Gaussian, which may be overly restrictive.
A related technique, considered an extension to the technique summarized above, uses a similar blur constraint and again assumes the blur kernel is a 1-dimensional Gaussian. However, unlike the above-described technique that focuses on a region with a single parametric motion, this related technique uses segmentation masks to simultaneously estimate multiple (parametric) motion blurs over the whole image as well as handling occlusions. However, while capable of estimating multiple motions/blurs, it shares most of the assumptions of the above-described technique, and hence its limitations regarding the need to introduce additional regularization and some image prior.
A different type of conventional deblurring involves the consideration of single image deblurring problems where the blur is entirely due to camera shake. For example, one such technique assumes the blur kernel is not necessarily parametric (e.g., by a 1-D motion vector), but also assumes that it is constant throughout the image. This technique casts the joint estimation of blur kernel and deblurred image as a maximum a posteriori (MAP) estimation problem in a probabilistic framework, though the likelihood term does not in fact fit into a probabilistic model. As is common with such processes, alternating optimization techniques are used to compute blur and the deblurred image. Although this technique is capable of producing good deblurring results, it unfortunately requires that a large number of parameters be set manually, with different optimal values being used for different inputs. Consequently, the applicability of such techniques is limited by the requirement to provide substantial user input and the lack of automated solutions for the various parameters.
Another type of conventional deblurring technique addresses both motion and defocus blur by modeling a blurred image as the result of a generative process, where an “ideal” image is blurred by motion and defocusing respectively at successive stages of the generation process. Region tracking is then used to determine motion, which in turn determines the motion blur. A latent image at higher resolution is then estimated by minimizing the difference between the observed image and the image generated from the latent image under the model (subject to a second-order smoothing constraint). Therefore, in addition to handling both motion and defocus blur, this technique also performs super resolution (i.e., upscaling relative to the original resolution). This technique assumes that the 1-D motion producing the blur is the same as the motion between images, which is reasonable if the motion does not change much between successive images. However, if the motion between images varies significantly, the estimated blur kernel is inaccurate. Further, in order to estimate motion, this technique requires reliable inter-image tracking, which can be challenging in presence of blur. This can be a problem in successfully deblurring images, especially if the blur kernel varies over time, since this would change the appearance of objects.
Yet another conventional technique attempts to solve various deblurring issues by addressing the problem of zeros in the frequencies of blur kernels by jointly deblurring a sequence of multiple images that are taken with different exposure times. This technique demonstrates that a joint deconvolution filter does not suffer from the divide-by-zero problem that affects single image deconvolution due to nulls of the PSF in frequency domain, unless the nulls of all the PSFs coincide somewhere (which is unlikely in practice due to varying exposure times). However, the method has various restrictions, including that the scene must consist of a single moving object at constant velocity and static background. In addition, the exposure times need to be known, and must form a periodic sequence in order to estimate motion. While these requirements are not uncommon in various conventional deblurring techniques, they are not typically met in most real-world videos unless those videos are captured under carefully controlled conditions.
A number of other conventional deblurring methods either use specialized hardware or assume a highly controlled environment that is not typically met with most real-world videos. For example, one such technique uses simultaneously captured high-resolution video at a low frame rate and low-resolution video captured at a high frame rate. Another such technique uses a blurred long-exposure image and a noisy short-exposure image of the same scene. Yet another such technique addresses the problem of motion blur by taking many short exposure (and hence high noise) images so that they are relatively free from blur. However, this technique does not actually model blur and hence is not applicable to inputs that do contain substantial amount of blur. While techniques such as these that make use of extra information are generally successful in improving image restoration quality, such information is usually not available in images and videos taken by ordinary people (e.g., a home video of a day at the beach taken on a typical video recording device).