There can be a number of situations in which images or video can be captured through a window. A person can be inside a car, train or building, and may wish to photograph a scene outside. Indoor situations can include exhibits in museums or zoos, which can typically be protected by glass. In addition, many cameras can now be mounted outside, for example, on buildings for surveillance, or on vehicles to prevent collisions. These cameras can be protected from the elements by an enclosure with a transparent window. Such images, however, can be affected by many factors including, for example, reflections and attenuation. As shown in FIG. 1A, these artifacts can significantly degrade the quality of the captured image.
A conventional approach to removing occluders from images can be, for example, to defocus them to the point of invisibility. This can be done by placing the camera right up against the glass, and using a large aperture to produce small depth-of-field. However, this has to be done at the time of capture, and in practice, it can be hard to get the camera sufficiently close to the occluders due to multiple layers of glass, or some difficulty approaching the window. Furthermore, such approach assumes that the camera has a fast lens and control of the aperture. This can be a particular issue for smartphone cameras, where the user can have little control over the exposure. The problem can be exacerbated by the small sensor size which can increase the depth-of-field. Correspondingly, shots with smartphone cameras through dirty or rainy glass still have significant artifacts, even if placed close to the window, as shown in FIG. 9A.
The use of machine learning for image denoising can be widespread. An early approach (see e.g., Reference 26) learns an energy function on the output of linear filters applied to the image. Closely related methods explore different bases and energy functions for example: sparse over-complete filters (see e.g., Reference 15), wavelet decomposition (see e.g., Reference 17) and a Field-of-Experts model. (See e.g., Reference 20). Other approaches (see e.g., Reference 27) use a large Gaussian mixture model (“GMM”) to directly model the distribution of natural image patches. These approaches (1) only consider additive white Gaussian noise (“AWGN”), which can be simpler than structured noise and (ii) build generative models of clean image patches.
Neural networks have previously been explored for denoising natural images, mostly in the context of AWGN (see e.g., References 11, 14 and 24). Although more challenging than AWGN, the corruption can still be significantly easier than the highly variable dirt and rain drops.
Removing localized corruption can be considered a form of blind inpainting, where the position of the corrupted regions may not be given, unlike traditional inpainting. (See e.g., Reference 6). The removal of salt-and-pepper noise has been shown (see e.g., Reference 5), although such approach does not extend to a multi-pixel corruption. Recently, other work has indicated how an unsupervised neural-network can perform blind inpainting, demonstrating the removal of text synthetically placed in an image. (See e.g., Reference 23). However, the noiseless text has different statistics to natural images. Thus, it can be easier to remove than rain or dirt which can vary greatly in appearance, and can resemble legitimate image structures.
Several methods explore the removal of rain from images, which include addressing (see e.g., References 1 and 8), rather than droplets on glass. For example, one approach uses defocus, while the other approach uses frequency-domain filtering. Both benefit from video sequences rather than a single image, however. Other approaches illustrate methods for detecting raindrops in a single image. (See e.g., References 18 and 19) However, these methods do not demonstrate removal.
It has been previously illustrated how lens dust and nearby occluders can be removed, but this method requires extensive calibration, or a video sequence, as opposed to a single frame (see e.g., Reference 10). Other work has shown how dirt and dust can be removed. (See e.g., References 22 and 25). One approach removes defocused dust for a Mars Rover camera, while the other approach removes sensor dust using multiple images and a physics model. However, there does not currently exist a method for removing dirty water and debris from an image taken through a window.
Thus, it may be beneficial to provide exemplary systems, method and computer-accessible medium that can remove dirty water and debris from an image taken through a window, and which can overcome at least some of the deficiencies described herein above.