1. Field of the Invention
This invention is generally related to display devices such as television monitors or liquid crystal displays, and more particularly to the field of video scan rate conversion from interlace to progressive format.
2. Description of the Related Art
Recent advances in digital televisions and flat-panel displays have greatly impacted the use of progressively scanned video formats. Video data are conventionally produced in the 2:1 interlaced scan rate for transmission and broadcast applications whereas the computer community uses the 1:1 progressive scan rate. The advantages of interlaced (I) video signals are that half the samples in a frame are updated in time, and therefore, half the bandwidth of the source is sufficient for data processing. On the other hand, progressive (P) scans display less flicker, are visually more pleasing, and are less stressful on human eye. Finally, progressive video materials are more amenable to video compression systems adopted in today's digital television (DTV) broadcasts.
Currently, interlaced and progressive formats co-exist in DTV industry. A domestic video display can be based on progressive scan technology. Therefore, video receivers often include format converters to change the nature of the signal from interlaced to progressive. A display can be an advanced flat-panel device capable of showing progressive video content up to the High Definition Television (HDTV) Signal or an intermediate display showing progressive Standard Definition Television (SDTV) Signal. Video Graphics Arrays (VGAs) and Personal Computers (PCs) are also used for progressive display of high resolution video, e.g., HDTV and SDTV, or low resolution video, e.g., ¼th or ⅛th of a SDTV signal.
Typically, a multimedia device (also called a set-top box or “STB”) is used to decode the audiovisual information and display the output. A set-top box typically includes a scan-rate converter (also called a “de-interlacer”), to convert data received in an interlaced format to a progressive format. In the absence of de-interlacing techniques, various degrees of interlaced artifacts are perceived when representing an interlaced material on a progressive monitor. For example, a PC monitor will demonstrate aggregated object contours when fed with interlaced broadcast material over the Internet, i.e., web-casting. In comparison, a high-end monitor will use an internal circuit in an attempt at de-interlacing. As discussed further in the paragraph below, neither conventional PC monitors nor television monitors using an internal circuit provide a suitably clear image.
Traditional approaches to de-interlacing have been based on spatial filtering, temporal filtering, vertical-temporal filtering, and median filtering. Traditional approaches can also include some form of edge enhancements. However, the major challenge with developing a robust de-interlacer is that interlaced sources are non-stationary in nature, and therefore, deriving an interpolation method for production of progressive outputs depends on video content. For example, static areas will benefit from temporal filtering while moving areas will look better with some form of spatial filtering. If the interlaced video is truly static, then the temporal displacement from frame to frame is almost zero, and the two fields of same frame are perceived as if they have been sampled at the same time. In this scenario the samples of one field can be repeated in time (a process called field insertion) to fill the new lines of the progressive output frame. Temporally repeating samples of one field to fill new lines of the progressive output frame is a process defined as temporal filtering.
However, if the interlaced video contains rapid motion, then the field-to-field temporal correlation within a single frame is weak and the operation of an (intra-field) spatial filtering is more suitable. An actual video sequence rarely, if ever, includes only one static data or rapid motion. Therefore, due to the nature of actual video sequences, opportunities for improvement are available over both temporal and spatial filtering.
Real-world video sequences are typically comprised of many dissimilar objects displaced at different velocities. An obvious solution for de-interlacing image samples with different grades of motion is to construct a filter that would dynamically change its behavior from spatial filtering to temporal filtering. The solution should also dynamically change from temporal filtering to spatial filtering. A simple way to accomplish this is applying a multi-tap median filter to the closest spatial and temporal neighboring samples of the interpolated samples. The usefulness of the median filter is motivated by the fact that if motion samples are fed to the filter, then the output will be one of the spatially positioned source pixels. This is because the self-similarity of the spatial pixels would be greatest as compared to the similarities between spatial and temporal pixels. On the other hand if the input samples are static, then all temporal and spatial pixels are self-similar and the output should be a good representative of either sampling group. However, if the region of interest happens to be static with many horizontal edges then it is likely that the median filter would destroy the edge pixels and the output frame would lack image detail. It is widely known that median filtering works well on fast motion areas but removes the vertical details of the interlaced source material.
Another classical filtering scheme for de-interlacing is the vertical-temporal filtering technique. In this approach a mask which typically extends over two or three fields in temporal direction and has several taps in vertical direction, is adopted. This mask is designed such that the filtering coefficients of the dominant field (the field to be interpolated) are of a low-pass filtering type and filtering coefficients of adjacent field(s) are of high-pass filtering type. This design concept ensures that the dominant image structures of the interlaced source field are present in the output progressive frame, and further, any vertical image detail that may be present in the neighboring fields are preserved in the output. Vertical-temporal filtering has certain disadvantages. For example, objects with non-vertical edges produce jerkiness under sudden accelerations.
One way to improve above de-interlacing schemes is to apply the filter coefficients in a direction where inter-pixel correlation is strongest. This way any image artifacts caused by the interpolation process has the least disruptive effect on the quality of the output video. This extension, used as an edge enhancement strategy, is described in prior art dealing with the problem of de-interlacing.
More advanced schemes have used motion estimation (ME) methods in combination with classical filtering techniques to offer more efficient solutions. These ideas are motivated by the fact that the amount of motion in the interlaced video sequence needs to be identified before proper filtering techniques are applied. Motion estimation can be performed on an interlaced video sequence to determine the best prediction and the amount of motion for a pre-defined area of a moving frame. For the ease of hardware realization, motion is estimated for blocks of image samples. ME techniques have been adopted in many different ways in de-interlacing schemes to predict the missing samples of the output progressive material. One approach is to fill the missing lines by the best predicted samples from past or future. A de-interlacer which fills in missing lines by predicted samples is dependent on the efficiency of the ME technique. Artifacts in the output pictures result when the ME technique provides other than the correct prediction. In an attempt to reduce, or eliminate these artifacts, a protective device in form of a median filter, is applied to a region of interest composed of predicted and source samples. A median filter improves the ME-based de-interlacer significantly at times but will destroy the vertical image details. Another way to take advantage of the ME technique is to measure the amount motion and incorporate a switch in the de-interlacer that would toggle between a temporal interpolator and a spatial interpolator.
An ideal ME solution requires storage of many frames of data. The added memory increases the cost of the overall hardware device, i.e., STB-chip or a stand-alone single-chip de-interlacer, and the delay associated with representing the output frames. Further, a large number of block-based ME tasks have to be performed for a block of image samples and a fixed block size is often not optimal for video objects of different sizes. Therefore, the existing approaches provide less-than-optimum results. The large memory and associated delay also make the design and development of an efficient de-interlacing architecture impractical, especially when large HD frames are processed.
What is needed is a means to de-interlace video data from an interlaced format to progressive format. A means of de-interlacing which provides an economical and efficient means of de-interlacing would be of additional benefit. A means of de-interlacing which can be incorporated into a single integrated circuit chip for integration into a set-top box or integration into a flat-panel (plasma) display would be of further benefit.