Camera flashes can produce intrusive bursts of light that disturb or dazzle. Provided herein is an exemplary camera and flash that can use infra-red and ultra-violet light outside the visible range to capture and/or obtain pictures and/or images in relatively low-light conditions. This “dark” flash can be, e.g., at least two orders of magnitude dimmer than conventional flashes for a comparable exposure. Building on ideas from flash/no-flash photography, a pair of images can be captured and/or obtained, one using dark flash, the other using dim ambient illumination alone. The relationships and/or correlations between images recorded at different wavelengths can be used to denoise the ambient image and restore fine details to give a high quality result, even in very weak illumination. The processing techniques can also be used to denoise images captured with conventional cameras.
The heavy-tailed distribution of gradients in natural scenes can have proven effective priors for certain problems such as denoising, deblurring and super-resolution. These distributions can be well modeled by a hyper-Laplacian (p(x)∝e−k|x|α) typically with 0.5≦α≦0.8. However, the use of sparse distributions can make the problem non-convex and impractically slow to solve for multi-megapixel images.
The introduction of digital camera sensors has transformed photography, permitting new levels of control and flexibility over the imaging process. Coupled with less expensive computation power, various photographic techniques have been described, collectively known as Computational Photography. Modern camera sensors, whether in a cell phone or a high-end DSLR, typically use either a CCD or CMOS sensor based on silicon. The raw sensor material can respond to light over a wide range of wavelengths, which can typically be, e.g., approximately 350-1200 nanometers (nm). Colored dyes can be deposited onto the sensor pixels in a Bayer pattern, resulting in 3 groups of pixels (e.g., red, green and blue). Each group responds to a limited range of wavelengths, approximating the sensitivities of the three types of cone cells in the human retina, for example. However, silicon is highly sensitive to infra-red (IR) wavelengths and it therefor can be difficult to manufacture dyes that have sufficient attenuation in this region, thus an extra filter is typically placed on top of most sensors to, e.g., block IR light. This yields a sensor that can record only over the range of approximately 400-700 nm. While matching the typical human's color perception, it is generally a considerable restriction of the intrinsic range of the device.
One solution to capturing photographs in low light conditions is to use a flash unit to add light to the scene. Although such solution provides the light to capture otherwise unrecordable scenes, the flash makes the photographic process intrusive. The sudden burst of light not only alters the illumination but typically disturbs people present, making them aware that a photo has just been taken and possibly dazzling them if they happen to be looking toward the camera. For example, a group photo in a dark restaurant using a bright camera flash can leave the subjects unable to see clearly for some moments afterward.
Dark flash camera/flash systems can be based around off-the-shelf consumer equipment, with a number of minor modifications. First, the camera can be a standard DSLR with the IR-block filter removed, thus restoring much of the original spectral range of the sensor. Second, a modified flash can be used that emits light over a wider spectral range than normal, which can be filtered to remove visible wavelengths. This dark flash can allow for the addition of light to the scene in such a way that it can be recorded by the camera, but not by a human's visual system. Using the dark flash, it is possible to illuminate a dimly lit scene without dazzling people present or disturbing other people in close proximity. Furthermore, it can allow for a fast shutter speed to be used, thus avoiding camera shake. People typically want images with colors that substantially match their own visual experience. However, this is generally not the case for images captured using heretofore available flash technologies.
Exemplary embodiments in accordance with the present disclosure can be regarded as a multi-spectral variation of the flash/no-flash technique introduced by Agrawal et al., Removing photography artifacts using gradient projection and flash-exposure sampling, ACM Transactions on Graphics (Proc. SIGGRAPH), 24, 828-835 (2005), Petschnigg et al., Digital photography with flash and no-flash image pairs, ACM Transactions on Graphics (Proc. SIGGRAPH) 23, 3, 664-672 (2004), and Eisemann et al., Flash photography enhancement via intrinsic relighting, ACM Transactions on Graphics (Proc. SIGGRAPH) 23, 673-678 (2004). Agrawal et al. 2005 focused on the removal of flash artifacts but did not apply their method to ambient images containing significant noise, unlike those described in Petschnigg et al., supra, and Eisemann et al., supra. The approaches described in the two latter publications are similar to one another in that they use a cross-bilateral (also known as joint-bilateral) filter and detail transfer. However, Petschnigg et al., supra, attempts to denoise the ambient image, adding detail from the flash, while Eisemann et al., supra, alter the flash image using ambient tones.
Bennett et al., Multispectral bilateral video fusion, IEEE Trans. Image Processing 16, 5, 1185-1194 (2007), describes how video captured in low-light conditions can be denoised using continuous IR illumination. However, they make use of temporal smoothing to achieve high quality results, something that is generally not possible in a photography setting. Wang, O., et al. Video relighting using infrared illumination, Computer Graphics Forum 27 (2008), describes, e.g., how IR illumination can be used to relight faces in well-lit scenes. Both of these approaches significantly differ from exemplary embodiments in accordance with the present disclosure in a number of ways: (i) they use complex optical bench based setups with twin cameras and beam-splitters as opposed to a single portable DSLR camera and temporally multiplex; (ii) both use IR alone rather than the near-UV and IR (for achieving high quality reconstructions); (iii) both rely on cross-bilateral filtering to combine the IR and visible signals, an approach which can have significant shortcomings. In contrast, disclosed herein is a principled mechanism for propagating information between spectral bands. This can be integrated into a unified cost function that combines the denoising and detail transfer mechanisms, treated separately in cross-bilateral filtering and related methods, such as is described in Farbman et al., Edge-preserving decompositions for multi-scale tone and detail manipulation, ACM Transactions on Graphics (Proc. SIGGRAPH) 27, 671-680 (2008).
Infra-red imaging has a history in areas such as astronomy and night-vision. In consumer photography the most prominent use can be considered to have been the Sony Nightshot where the IR-block filter can be switched out to use the near-IR part of the spectrum. The images are monochrome (with a greenish tint) and generally no attempt is made to restore natural colors to them. Other imaging approaches use Far-IR wavelengths to record the thermal signature of people or vehicles. However, this can require specialized optics and sensors and thus has limited relevance to consumer photography. Ultra-violet (UV) photography generally has received little attention, other than from flower photography enthusiasts (see, e.g., Rorslett, B., Flowers in Ultraviolet, available at http://www.naturfotograf.com/UV_flowers_list.html (last accessed Jan. 7, 2010)). Many flowers that can look plain to humans can have vibrant patterns under UV light to attract insects sensitive to these wavelengths.
Multi-spectral recording using visible wavelengths has been explored by several authors. Park et al., Multispectral Imaging Using Multiplexed Illumination, ICCV 1-8 (2007), describes the use multiplexed illumination via arrays of colored LEDs to recover spectral reflectance functions of the scene at video frame rates. Exemplary embodiments of the system, method and computer-accessible medium according to the present disclosure can be used in a similar manner for still scenes, being able to estimate the reflectance functions beyond the visible range. Mohan et al., Agile spectrum imaging: Programmable wavelength modulation for cameras and projectors, Computer Graphics Forum 27, 2, 709-717 (2008), describes use of a diffraction grating in conjunction with an LCD mask to give control over the color spectrum for applications including metamer detection and adaptive color primaries.
Processing of the flash/no-flash pair in accordance with the present disclosure exploits the relationships and/or correlations between nearby spectral bands. Most work on image priors can be considered as having focused on capturing spatial correlations within a band. For example, priors based on the heavy tailed distributions of image gradients have proven effective in a wide range of problems such as denoising (see, e.g., Portilla et al., Image denoising using a scale mixture of Gaussians in the wavelet domain, IEEE Trans. Image Processing 12, 11, 1338-1351 (2003), deblurring (see, e.g., Fergus et al., Removing camera shake from a single photograph, ACM Transactions on Graphics (Proc. SIGGRAPH) 25, 787-794 (2006)), separating reflections (see, e.g., Levin and Weiss, User assisted separation of reflections from a single image using a sparsity prior, IEEE Trans. Pattern Analysis and Machine Intelligence 29, 9, 1647-1654 (2007)). However, models that exploit dependencies between color channels tend to be less common. The K-SVD denoising approach of Aharon et al., The KSVD: An algorithm for designing of overcomplete dictionaries for sparse representation, IEEE Trans. Signal Processing 54, 11, 4311-4322 (2006), can do so by vector quantizing color patches. The fields-of-experts approach of Roth et al., Fields of Experts: A Framework for Learning Image Priors, CVPR, 2, 860-867 (2005) has also been extended to model color images (see, e.g., McAuley et al., Learning high-order MRF priors of color images, ICML 06, 617-624 (2006)) and uses color marginal filters. However, neither of these approaches explicitly model the interchannel correlations, unlike the exemplary system, method and computer accessible medium according to the present disclosure. Explicit spectral models are used in color constancy problems and joint spatial-spectral models have been proposed (see, e.g., Singh et al., Exploiting spatial and spectral image regularities for color constancy, Workshop on Statistical and Computational Theories of Vision (2003), and Chakrabarti et al., Color constancy beyond bags of pixels, CVPR, 1-6 (2008)) for this task, but these generally assume a noise-free image. Morris et al., Statistics of infrared images, CVPR, 1-7 (2007), describes measuring the spatial gradients of far IR images gathered with a specialized camera, demonstrating their similarity to those of visible light images.
Flash-based methods are generally not the only solution to taking pictures in low-light levels. Wide aperture lenses gather more light but are heavy and expensive, making them impractical for most photographers. Anti-shake hardware can be used to capture blur-free images at slow shutter speeds. These techniques can be combined with an exemplary approach in accordance with the present disclosure to extend performance to even lower light levels. Software-based deblurring techniques (see, e.g., Fergus et al., supra, and Jiaya, J., Single image motion deblurring using transparency, CVPR, 1-8 (2007)) can only cope with modest levels of blur and typically have artifacts in their output. Denoising techniques, such as described in, e.g., Tomasi et al., Bilateral filtering for gray and color images, ICCV, 839-846 (1998), and Portilla et al., supra, can have similar performance issues and cannot cope with the noise levels that can be addressed by certain exemplary embodiments disclosed herein. Joint denoising/deblurring techniques, such as that described in, e.g., Yuan et al., Image deblurring with blurred/noisy image pairs, ACM Transactions on Graphics (Proc. SIGGRAPH) 26, 1-10 (2007), may provide better performance but still require a problematic deconvolution operation, which can introduce artifacts. Methods that register and combine a stack of noisy images, such as described in, e.g., Telleen et al., Synthetic shutter speed imaging, Computer Graphics Forum 26, 3, 591-598 (2007), can have the inconvenience of needing to capture far more than two images. A visible flash can be made non-dazzling by using a diffuser and aiming at the ceiling. Such methods can work appropriately but can be limited to indoor settings with a relatively low ceiling of neutral color.
Natural image statistics are a powerful tool in image processing, computer vision and computational photography. Denoising (see, e.g., Portilla et al., supra), deblurring (see, e.g., Fergus et al., supra), transparency separation (see, e.g., Levin and Weiss, supra) and super-resolution (see, e.g., Tappen, M. F. et al., Exploiting the sparse derivative prior for super-resolution and image demosaicing, SCTV (2003)), are all tasks that can be inherently ill-posed. Priors based on natural image statistics can regularize these problems to yield quality results. However, digital cameras now have sensors that record images with tens of megapixels (MP), e.g., the latest Canon DSLRs have over 20 MP. Solving the above tasks for such images in a reasonable time frame (e.g., a few minutes or less), poses a significant challenge to existing algorithms. An exemplary problem can be addressed by the exemplary embodiments of the present disclosure, e.g., non-blind deconvolution, and can address very large images while still yielding high quality results.
Various deconvolution approaches can exist, varying substantially in their speed and sophistication. Simple filtering operations are fast but typically yield poor results. Most of the adequately-performing approaches solve globally for the corrected image, encouraging the marginal statistics of a set of filter outputs to match those of uncorrupted images, which can act as a prior to regularize the problem. For these methods, a trade-off can exist between accurately modeling the image statistics and being able to solve the ensuing optimization problem efficiently. If the marginal distributions can be assumed to be Gaussian, a closed-form solution exists in the frequency domain and FFTs can be used to recover the image very quickly. However, real-world images can typically have marginals that are non-Gaussian, and thus the output can often be of mediocre quality. One approach is to assume the marginals have a Laplacian distribution. This can allow a number of fast l1 and related TV-norm methods, such as described in, e.g., L. Rudin et al., Nonlinear total variation based noise removal algorithms, Physica D 60, 259-268 (1992), and Wang, Y. et al., A new alternating minimization algorithm for total variation image reconstruction, SIAM J. Imaging Sciences 1, 3, 248-272 (2008), to be deployed, which can give appropriate results in a reasonable time.
However, studies of real-world images have shown the marginal distributions have significantly heavier tails than a Laplacian, being modeled by a hyper-Laplacian (see, e.g., Field, D., What is the goal of sensory coding?, Neural Computation 6, 559-601 (1994), Levin, Fergus, Durand and Freeman, Image and depth from a conventional camera with a coded aperture, ACM TOG (Proc. SIGGRAPH) 26, 3, 70 (2007) and Simoncelli et al., Noise removal via Bayesian wavelet coring, ICIP 379-382 (1996)). Although such priors can give appropriate quality results, they can typically be slower than methods that use either Gaussian or Laplacian priors. This can be a consequence of the problem becoming non-convex for hyper-Laplacians with α<1, meaning that it is possible that many of the fast l1 or l2 tricks are no longer applicable. Instead, standard optimization methods such as conjugate gradient (CG) can be used. One variant that can work in practice is iteratively reweighted least squares (IRLS) as described in, e.g., Stewart, C. V., Robust parameter estimation in computer vision, SIAM Reviews 41, 3, 513-537 (1999), which can solve a series of weighted least squares problems with CG, each one an l2 approximation to the non-convex problem at the current point. In both cases, typically hundreds of CG iterations can be used, each of which can involve an expensive convolution of the blur kernel with the current image estimate.
Hyper-Laplacian image priors have been used in a range of settings: super-resolution (see, e.g., Tappen et al., supra), transparency separation (see, e.g., Levin and Weiss, supra) and motion deblurring (see, e.g., Levin, A., Blind motion deblurring using image statistics, NIPS (2006)). Work that can be considered relevant to certain exemplary embodiments of the present disclosure, such as that described in, e.g., Levin, Fergus, Durand and Freeman, supra, and Joshi et al., Image deblurring and denoising using color priors, CVPR (2009), has been applied to non-blind deconvolution problems using IRLS in an attempt to solve the deblurred image problem. Other types of sparse image priors include, e.g., Gaussian Scale Mixtures (GSM) (see, e.g., Wainwright et al., Scale mixtures of Gaussians and the statistics of natural images, NIPS 855-861 (1999)), which have been used for image deblurring (see, e.g., Fergus et al., supra), denoising (see, e.g., Portilla et al., supra) and student-T distributions for denoising (see, e.g., Welling et al., Learning sparse topographic representations with products of student-t distributions, NIPS (2002) and Roth et al., supra). With the exception of Portilla et al., supra, these methods generally use CG and thus can be slow.
The alternating minimization procedure that can be adopted by certain exemplary embodiments in accordance with the present disclosure can be a technique known as half-quadratic splitting (see, e.g., Geman and Reynolds, Constrained restoration and recovery of discontinuities, PAMI 14, 3, 367-383 (1992) and Geman and Yang, Nonlinear image recovery with half-quadratic regularization, PAMI 4, 932-946 (1995)). Recently, Wang, Y. et al., supra, showed that it could be used with a total-variation (TV) norm to deconvolve images. Exemplary embodiments according to the present disclosure can be considered to be related to this work: e.g., certain exemplary embodiments can also use a half-quadratic minimization, but the per-pixel sub-problem is quite different. With the TV norm, the problem can be solved with a straightforward shrinkage operation. As a consequence of using a sparse prior, the problem can be non-convex. Accordingly, solving the problem efficiently is one of the objectives provided by exemplary embodiments according to the present disclosure.
Described in Chartrand, R., Fast algorithms for nonconvex compressive sensing: Mri reconstruction from very few data, IEEE International Symposium on Biomedical Imaging (2009) and Chartrand and Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems 24, 1-14 (2008), for example, is a non-convex compressive sensing procedure, in which the usual l1 norm on the signal to be recovered is replaced with a lp quasi-norm, where p<1. A splitting scheme can be used, resulting in a non-convex per-pixel sub-problem. To solve this, a Huber approximation (see, e.g., Chartrand, R., supra) to the quasi-norm can be used, which can allow for the derivation of a generalized shrinkage operator to solve the sub-problem. However, this approximates the original sub-problem, unlike exemplary embodiments in accordance with the present disclosure.