In an image formation process, image sensor measurements are subject to degradations. Raw sensor readings suffer from photon shot noise, optical aberration, read-out noise, spatial subsampling in the color filter array (CFA), spectral cross-talk on the CFA, motion blur, and other imperfections. An image signal processor (ISP), which may be a hardware entity, addresses such degradations by processing the raw measurement in a sequential pipeline of steps, each targeting a degradation type in isolation, before displaying or saving the resulting output image. The ISP performs an extensive set of operations, such as demosaicing, denoising, and deblurring. Current image processing algorithms are designed to minimize an explicit or implicit image reconstruction loss relevant to human perceptions of image quality.
Progress in imaging and graphics has enabled many applications, including autonomous driving, automated design tools, robotics, and surveillance, where images are consumed directly by a higher-level analysis module without ever being viewed by humans. This gives rise to the question of whether signal processing is necessary, i.e., whether a learning machine is better trained directly on raw sensor data. ISPs map data from diverse camera systems into relatively clean images. However, recovering a latent image is difficult in low-light captures that are heavily degraded by photon shot noise. Low light is, in effect, a failure mode for conventional computer vision systems, which combine existing ISPs with existing classification networks.
The performance of conventional imaging and perception networks degrades under noise, optical aberrations, and other imperfections present in raw sensor data. An image-processing pipeline may interpose an image source and an image renderer to reconstruct an image that has been deteriorated. An image pipeline may be implemented using a general-purpose computer, a Field-Programmable Gate Array (FPGA), or an Application-Specific Integrated Circuit (ASIC). Conventional image-processing pipelines (ISPs) are optimized for human viewing, not for machine vision.
A demosaicing process, which is also called color-filter-array interpolation (CFA interpolation), reconstructs a full color image from incomplete color samples output from an image sensor overlaid with a CFA.
An image denoising process estimates the original image by suppressing noise from a noise-contaminated image. Several algorithms for image denoising are known in the art.
An image deblurring process attempts to remove blurring artifacts from images, such as blur caused by defocus aberration or motion blur.
It is observed that conventional perception networks, which use state-of-the-art ISPs and classifiers trained on a standard JPEG dataset, perform poorly in low light.
There is a need, therefore, to explore improved perception networks that perform well under adverse illumination conditions.