The NTSC, PAL, and SECAM analog television standards and the ITU-R BT.601 and ITU-R BT.709 digital video television standards are in widespread use throughout the world today. All of these standards make use of interlacing video signals in order to maximize the vertical refresh rate, which reduces wide area flicker, while minimizing the bandwidth required for transmission. With an interlaced video format, half of the lines that make up a picture are displayed during one vertical period (i.e. the even video field), while the other half are displayed during the next vertical period (i.e. the odd video field) and are positioned halfway between the lines displayed during the first period. While this technique has the benefits described above, the use of interlacing can also lead to the appearance of visual artifacts such as line flicker for stationary objects and line crawling for moving objects.
The visual artifacts can be minimized and the appearance of an interlaced image can be improved by converting the interlaced video signal to a non-interlaced (progressive) format and displaying it as such. In fact, many newer display technologies, such as for example Liquid Crystal Displays (LCD), Plasma Display Panels (PDP), and Digital Micro-mirror Devices (DMD) are designed to display progressively scanned, i.e., non-interlaced, video images.
A conventional progressive video signal display system, e.g., a television (TV) or a projector, is illustrated in FIG. 1. As is shown the display system 20 includes a signal receiving unit 22 that is coupled to a tuner box 24, and a video decoder 28. Signals, such as television signals, are captured by the signal receiving unit 22 and transmitted to the tuner box 24. The tuner box 24 includes a converter 25 and a demodulation unit 26 that transforms the incoming signal into an analog signal. The analog signal 27 is received by the video decoder 28, which outputs an interlaced video signal 29. As stated above, if the interlaced video signal 29 is displayed, undesirable visual artifacts, such as line flicker and line crawling, exist. Accordingly, a de-interlacer 30 is used to convert, i.e., de-interlace, the interlaced video signal 29 to generate a progressive video signal 32. The progressive video signal 32 is then displayed via a display 34, such as an LCD, a PDP, or a DMD.
Numerous methods have been proposed for de-interlacing an interlaced video signal to generate a progressive video signal. For instance, some methods perform a simple spatial-temporal de-interlacing technique, such as line repetition and field insertion. These methods, however, do not necessarily take into consideration motion between video fields. For instance, it is well known that while line repetition may be acceptable for image regions having motion, line repetition is not suitable for stationary (still) image regions due to loss of vertical spatial resolution. By the same token, field insertion is a satisfactory de-interlacing method for stationary image regions, but inadequate for moving image regions due to objectionable motion artifacts. Therefore, utilizing one method presents a tradeoff between vertical spatial resolution and motion artifacts
To address this issue, some de-interlacing methods are motion adaptive, i.e., they take into consideration the motion from video field to video field and/or from pixel to pixel in adjacent video fields. Motion adaptive de-interlacing methods can dynamically switch or mix between different de-interlacing methods, such as between line repetition and field insertion. Per-field motion adaptive de-interlacing methods select a de-interlacing technique on a field-by-field basis. Thus, per-field motion adaptive de-interlacing methods do not maintain the overall quality throughout an image when there are both stationary and moving regions on it. Whereas, per-pixel motion adaptive de-interlacing methods select a de-interlacing technique on a pixel-by-pixel basis, thus providing a much better overall quality throughout an image.
Yet more de-interlacing methods are based on identifying the type of the source material from which the interlaced video signal was generated. For example, motion picture film or computer graphics signals are inherently progressive, i.e., non-interlaced. When the signals are transmitted for broadcasting, the signals are converted into interlaced video signals according to analog TV standards such as NTSC, PAL, and SECAM, or digital video standards such as ITU-R BT.601 and ITU-R BT.709 interlaced formats. Well known techniques such as 3:2 pull-down or 2:2 pull-down are used to break the original progressive frames into interlaced video fields while maintaining the correct frame rate. De-interlacing such signals originating from such non-interlaced (progressive) sources can be achieved with high quality if the original progressive frame sequences can be identified and reconstructed correctly. Thus, by recognizing that a video sequence originates from a progressive source, the original progressive frames can be reconstructed exactly by merging the appropriate video fields.
Typically, the source of the interlaced video signal can be determined by examining the motion between successive fields of an input video sequence. In a co-pending patent application entitled “METHOD AND SYSTEM FOR DETECTING MOTION BETWEEN VIDEO FIELD OF SAME AND OPPOSITE PARITY FROM AN INTERLACED VIDEO SOURCE,” (Ser. No. 11/001,826), filed on Dec. 2, 2004, and herein incorporated in its entirety by reference, a same and opposite-field motion detection system is described. The motion detection system measures the signal values of one set of vertically adjacent pixels from a video field of one parity and two other sets of vertically adjacent pixels from the two neighboring video fields of the opposite parity such that when taken together, these pixels represent relevant samples of an image near the vertical and temporal positions. The motion detection system then calculates three sets of motion values, where one set is between the subject video field and its previous video field, a second set is between the subject video field and its subsequent video field, and the third set is between the previous video field and the subsequent video field.
FIG. 2 is a schematic representation of eight pixels used by the motion detection system to calculate motion values according to one embodiment. Three consecutive fields (10a, 10b, 10c) are shown. A subject field 10b is preceded by a preceding field 10a and followed by a subsequent field 10c. Three pixels 14 in each of the preceding 10a and subsequent 10c fields and two pixels 14 in the subject field 10b are used to calculate same and opposite-field motion measurements pertaining to a target pixel 16, the value of which is to be interpolated. All eight pixels 14 are vertically adjacent in their respective fields and they form a butterfly profile in the temporal-vertical plane as shown in FIG. 2. The three sets of motion values calculated from the pixels 14 can be used to determine an appropriate de-interlacing technique to be used to calculate the interpolated value of the target pixel 16.
While the motion detection system described in the aforementioned co-pending patent application performs well for its intended purpose, those skilled in the art readily appreciate that the motion values derived from the pixels 14 can be distorted by noise in the video signal itself. In the NTSC, PAL, or SECAM analog video system, noise can be created or inadvertently added to the video signal through the capture, duplication, editing, transmission/reception, modulation/demodulation, and encoding/decoding processes. Moreover, a digital video signal can also contain noise, either from noise present in the original analog content or introduced as a result of digital compression/decompression processes.
In general, noise in the video signal distorts the visual appearance of the image and is particularly objectionable to the human eye when the image contains large areas of solid colors, and especially when the luminance levels are low (e.g., in shades of saturated colors). Thus, reducing or eliminating noise from the video signal is desirable in high quality display components, such as televisions, computer monitors, DVD players, digital cameras, and the like.
Typically, noise reduction of a video signal is based on the difference in the statistical properties between correlated pixel values conveying an image and random pixel values due to noise. Noise reduction is typically implemented through some form of linear or nonlinear operation on the input pixel data. The operation typically involves linear filtering (e.g., weighted averaging) for additive white Gaussian noise (AWGN) or order-statistical filtering (e.g., maximum, minimum, or median) for impulsive noise. The correlation between pixel data values of a video signal is typically based on the temporal or spatial proximity of the pixels. The pixels inside a temporal or spatial neighborhood for performing the linear or nonlinear operation are collectively referred to as a noise filter support. These pixels are usually selected based on criteria, such as “K-nearest neighbors”, i.e., the K neighboring pixels whose values are nearest to the target pixel value, and/or “sigma nearest neighbors”, i.e., those neighboring pixels whose difference from the target pixel is less that a predetermined parameter. The selection process is generally based on the difference between the target pixel and its temporal or spatial neighbors.
Conventional video signal noise reduction systems can perform noise reduction operations in either the spatial or temporal domains to reduce visual artifacts due to noise. Spatial noise reduction systems perform noise reduction within a frame or field and can be rather effective in reducing AWGN. Temporal noise reduction systems perform noise reduction between frames/fields and can be more effective in reducing low-KOLR frequency noise (LFN) such as clamp noise in an analog video signal. Thus, ideally, a noise reduction system should perform both spatial and temporal noise reduction operations to reduce both AWGN and LFN.
Nonetheless, a noise reduction system that performs both spatial and temporal noise reduction operations requires pixel and/or line buffers as well as frame/field buffers. Frame/field buffers are particularly costly and also increase the complexity of the system. Moreover, reading and writing video field pixel data from and to the frame/field buffers consume system resources, such as memory bandwidth and power consumption. Thus, while desirable, such a system is less economically feasible.