1. Field of the Invention
The present invention relates to noise reduction filter circuitry, and to a method of operation of such filter circuitry. In particular, the present invention relates to filter circuitry for reducing noise in an input stream of image signals forming a video stream, and is suitable for use prior to input to a video encoder used to compress the video stream.
2. Description of the Prior Art
Image noise is a problem dating back to the early days of photography. The problem has since extended into the video domain, and is today a major limiting factor in digital video, both in terms of visual quality and compression performance.
The most obvious cause of image noise is the random arrival rate of incoming photons. This noise cannot be eliminated, but it can be reduced to acceptable levels by increasing the number of photons per pixel, effectively reducing the variance around the true pixel value. The traditional way of increasing the number of photons is to increase exposure time and aperture size.
Recent breakthroughs in handheld devices have enabled consumers to use their cell phones as imaging devices, often replacing the use of digital cameras completely. However, tougher constraints on size and weight, in combination with demand for video recording, limit the practical range of exposure time and aperture size.
In modern handheld devices, image sensors are used to create digital images from optical sources, and are typically constructed as arrays of photodetector elements. However, digital image sensors are colour blind, in that they only count the number of incoming photons in a specific spot. In order to acquire colour images, a colour filter array (CFA), such as a Bayer filter, is placed in front of the sensor array, passing wavelength limited photons to each pixel sensor.
Images captured in such a manner suffer from noise associated with a variety of sources, the most common type of which is additive white Gaussian noise (AWGN). Whilst at the time of image capture by the digital image sensor, the noise in each pixel may be assumed to be independent and normally distributed, such digitally captured images are typically subjected to a number of image signal processing techniques prior to forming an input stream of image signals used in downstream video processing circuits such as a video encoder. Firstly, colour interpolation from neighbouring pixels is used to compute the most likely RGB colour tuple in each pixel position. Such a technique, also commonly known as demosaicing, distorts the noise distribution and presents a significant problem when dealing with noise detection. In particular, the noise in adjacent pixels is no longer independent. After such colour interpolation, the resultant image data is typically subjected to colour correction and gamma correction, these techniques being used to compensate for flaws in the image sensor and to adapt the image to the human visual system. Both colour correction and gamma correction act as digital amplification of noise.
Following such colour and gamma correction, it is typical to convert the RGB signals into YUV colour space composed of one luma channel and two chroma channels. The luma channel Y is a weighted sum of the gamma-corrected RGB colours, whilst the chroma channels U and V (also referred to as Cb and Cr) are defined as the blue and red differences from the luma channel. Many applications, ranging from compression to transmission of images and video, operate in the YUV colour space because of the benefits of separating luma and chroma channels. In particular, the eye is more sensitive to luma changes and this is often exploited by using a lower resolution for the chroma channels, known as chroma sub-sampling.
From the above discussion, it will be appreciated that by the time an input stream of image signals is produced that is ready to be used by an application such as a video encoding application, the original noise distribution at the time of capture by the image sensor has been distorted and amplified. It is important to try and reduce this noise, not only because of its effect on visual quality, but also because of its adverse effects on compression performance of such an encoder. In particular, such noise can reduce the quality of the signal output by an encoder at a given bit rate, or increase the bit rate needed to achieve a given quality requirement, thereby reducing the amount of compression achieved.
Image denoising is a popular research area. Video denoising is by its nature an even broader topic, of which image denoising is a subset known as spatial filtering. In contrast, filters that use information from several images (also known as frames) simultaneously are called temporal filters. Combinations of the two types are known as spatiotemporal filters. Temporal filters generally allow for richer denoising possibilities but are more expensive to implement in terms of memory bandwidth, which is an important issue in embedded systems.
Video sequences can be considered to consist of two separate components, namely the film itself formed by a series of images, and the unwanted noise component. The true video sequence is assumed to be correlated in time while noise typically has no inter-frame correlation. This is true for an uncompressed input stream and enables temporal filters to be used for video denoising.
The optimal method for reducing image noise is to increase the exposure time. However, in video acquisition the exposure time is limited by the frame rate. It is possible to compensate for this by using temporal filters to average consecutive frames. Unconstrained averaging on the other hand introduces motion blur that may be even more visually disturbing than the original noise.
Norell et al in the article “Spatio-Temporal Noise Reduction ASIC for Real-Time Video Processing”, Department of Information Technology, ITE, Mid Sweden University, http://www.es.isy.liu.se/norsig2000/publ/page021_id103.pdf, 2000, propose a multi-frame filter with variable depth. The algorithm proposed in Norell focuses on removing impulse noise such as salt and pepper noise and lost frames by applying a median filter (a type of spatial filter) to pixel values in consecutive frames. The depth of the filter (i.e. the number of frames that are used by the temporal filter) is local within a frame and decided by comparing luma values in a fixed spatial area with past and future frames. Hence, in accordance with such a technique, some initial spatial filtering is performed within each frame in order to produce an average pixel intensity in the fixed spatial area, which is then compared between each frame in order to decide on the number of frames used by the temporal filter. The output produced is then the result of the temporal filtering performed using the determined number of frames.
Whilst the above article by Norell et al describes a variable temporal depth form of temporal filter, another type of temporal filter is known as a motion adaptive temporal recursive filter. Such a filter is described in the article “A New Video Noise Reduction Algorithm Using Spatial Sub-Bands” by A Amer et al, International Conference on Electronics, Circuits, and Systems, Pages 45 to 48, IEEE, 1996. Temporal recursive filters keep an accumulated frame in memory that is combined with the new frame to form the filter output. The article by Amer et al describes use of a motion adaptive control unit to control the averaging process locally in the spatial domain. In particular, the motion adaptive control unit calculates the weighting factor between the accumulated frame and the current frame.
As mentioned earlier, temporal filters are relatively expensive to implement in terms of memory bandwidth. Whilst recursive filters of the type described by A Amer et al can offer benefits in this regard when compared with variable temporal depth filters, due to the need to access only a single previous frame (that representing the accumulated image) rather than multiple image frames, such recursive filters have traditionally had a significant disadvantage. Recursive filters require a trade-off between motion blur and noise. In particular, motion blur and noise cannot be simultaneously avoided in high movement areas even with a carefully designed motion adaptive control unit.
The article “Computationally Fast Techniques to Reduce AWGN and Speckle in Videos” by D Sen et al, IET Image Process, 2007, 1, (4), Pages 319 to 334, describes a scheme that uses a change detection technique to measure the interframe motion and carry out estimations in both the spatial and temporal directions of the video. In particular, a spatial estimation is first performed, and then the output from the spatial estimation is used as an input for the temporal estimation, such that the temporal estimation is then carried out using the spatial estimates. The output from the temporal estimation process is hence a spatiotemporal estimate. Thereafter, the spatiotemporal estimate output by the temporal estimation block and the original spatial estimation output by the spatial estimation block are subjected to a combining process in order to produce the final filtered output. Whilst this enables some variation between spatial filtering and temporal filtering, both of the inputs to the combination circuitry have been subjected to the spatial estimation process.
It would be desirable to provide an improved technique for reducing noise in an input stream of image signals forming a video stream.