1. Field of the Invention
The present invention relates generally to video and image processing. In particular, the present invention relates to the reduction of Gaussian additive noise in video sequences by adaptively weight averaging pixels in time using motion detection, motion compensation, and estimation of local noise characteristics to determine the weights.
2. Description of the Related Technology
In the past decade applications of digital video have increased dramatically. These applications range from the use of digital video for cinemagraphic archiving, medical imaging, video storage and playback on DVDs. In addition, digital video also forms the basis for more efficiently transmitting television, via cable, over the air, and over the Internet.
The last application is especially important. Compression algorithms based on digital video are able to achieve higher compression ratios than what is capable through analog techniques, thereby reducing the bandwidth required for transmission of video. Where formerly a cable channel's bandwidth would support the transmission of a single analog video channel, with digital compression cable operators could operate at various points on the resolution/bandwidth trade off curve, allowing 12 video channels of average quality or 7-8 channels of superior quality to be transmitted in a bandwidth that formerly carried one analog channel of video. Video compression has also made HDTV possible: without it the bandwidth required for transmission could not be supported within the present allocations for bandwidth. Digital video is fundamental to the transmission of video using the Internet's packetized techniques. It allows the use of buffers to eliminate variations in a packet's time of arrival, and the application of even more powerful compression algorithms that further reduce the usage by the video signal of the channel's capacity (which in the Internet is shared by other users).
The pervasive use of digital video has spawned increased interest and demand for noise filtering algorithms. Noise reduction can be critical to overall system operation, since the presence of noise in video not only degrades its visual quality but affects subsequent signal processing tasks as well. Noise is especially deleterious to digital video that will be compressed and decompressed. The effect is inherent in compression algorithms. These algorithms are designed to recreate a sequence of images that will be perceived by the eye as being virtually identical to the images created from the uncompressed data. Since they do not reject noise, the algorithms treat it as signal, and attempt to create data that represents components of noise that will be most visible to the eye. Worse yet, in most instances the output of the video compression unit is limited in data rate to match it to the rated capacity of the channel through which the data is transmitted. When noise captures some of bits that are outputted by the video compressor, fewer bits are left to represent the real signal. Therefore noise reduction—the elimination, as far as possible, of noise contaminating the video—is a desirable adjunct to video compression.
Noise is a catch-all term for an unwanted signal that is interfering with the signal that is desired. It is noticeably present in television receivers situated in areas with having marginal signal conditions for receiving a conventional amplitude modulated vestigial sideband television signal. This noise is commonly modelled as being additive, white and Gaussian. In the case of analog video delivered by satellite, the video signal is frequency modulated onto a carrier. The signal out of the ground receiver is accompanied by noise that is additive and Gaussian when the receiver is operating above threshold (i.e., the vector representing the noise in signal space is usually much smaller than the vector representing the modulated signal). When the system is close to threshold, the character of the noise becomes impulsive, leading, for example, to the clicks that are heard on an automobile radio as the FM station being received goes out of range. For video transmitted by satellite, the impulses appear in the picture as short white or dark streaks. A satellite or terrestrial television receiver may also be affected by man-made noise such as impulsive noise originating from motor vehicles.
Applying noise reduction to video is the process of identifying the desired video signal and using that information to discriminate against the noise. Best performance is achieved by utilizing one of a broad range of processing options that is available only through the use of digital techniques. The input video would be sampled into numerical pixel values indexed by horizontal and vertical spatial coordinates and a time coordinate that is an indicator of frame number. A filtering operation is modelled as a sequence of arithmetric operations performed on the input samples to form an output pixel.
The present approaches to noise reduction filtering can be categorized into three types: a spatial noise reduction filter, a temporal noise reduction filter and 3D noise reduction filter. The spatial noise reduction filter filters the input image in the spatial domain only, ignoring information in the temporal direction. Temporal noise reduction filtering operates only on pixels in the temporal direction, i.e., having different positions on the time axis, and can further be divided into motion adaptive methods and motion compensated methods. The motion adaptive methods process the pixels at the same location in width and height from frame to frame, basing the filter parameters on the degree to which relative motion between the frames at the pixel location is detected. The motion compensated methods filter pixels along a motion trajectory that is based on evidence taken from motion estimation results. Existing three dimensional noise reduction filters combine temporal filters with spatial filters to get the benefits of each.
Noise reduction inherently implies averaging together elements of the signal that are almost identical. Suppose a given pixel has a noise-free value of 0.5, meaning its brightness is half-way between peak white and black. The pixel is contaminated by noise n1, so the pixel value that is actually available is P1=0.5+n1. With additional knowledge, a second pixel may be found in another position with value P2=0.5+n2, where n1 and n2 are both noise values and are uncorrelated. The weighted average of 0.5 P1+0.5 P2 is found to be equal to 0.5+½(n1+n2). The power in ½(n1+n2) is one-half the power in n1 or n2. Thus, averaging together the values of the two pixels improves the signal/noise ratio of the estimated pixel value by a factor of 2. However, if P2=0.3+n2, meaning that the brightness of the second pixel was closer to black, then 0.5 P1+0.5 P2=0.4+½(n1+n2). The net effect of weighting P1 and P2 equally before averaging in the second case is to introduce an error into the estimate for the brightness of the pixel the weighted average is supposed to represent. This example illustrates the basic principle of this invention: to reduce the noise level associated with a particular pixel, weight average its value with a second pixel value whose noise free brightness is close to the one in question. When the confidence level in the equality of the noise free brightness levels is high, the weights assigned to the 2 pixel values should be approximately equal; if the confidence level is low, the second pixel level is effectively disregarded by making its weight close to zero, with the first pixel value weighted by (1−weight used for 2nd pixel).
One advantage of temporal noise reduction filtering is that it is more probable that a second pixel can be found in the previous frame that has a similar noise-free brightness level to a given pixel in the current frame, because often only small changes occur in video from frame to frame. Another advantage is that the pixel trajectories along which processing takes place are one-dimensional, extending from a pixel in one frame to another pixel to the next. (The trajectories become discontinuous when there is a scene change.) Thus the processing for temporal noise reduction only looks backwards in time for pixel values to use for weight averaging with the pixel in the current frame. To utilize the simple structure of temporal filtering, a method is needed that uses measurements taken from the input data itself, either unprocessed or filtered, to adaptively sense which pixels should be averaged together and what weights should be placed on each in averaging.