Progressive display devices display all lines of an image every refresh. In contrast, interlaced display devices, such as NTSC and PAL television displays, typically display images using even and odd line interlacing. To display interlaced video on a progressive display, video rendering systems have to generate pixel data for scan lines that are not received in time for the next frame update. This process is called de-interlacing. When such interlaced signals are received for display on a progressive computer display, picture quality problems can arise especially when motion is occurring in the picture where inferior methods of de-interlacing are used.
The problem exists particularly for personal computers having multimedia capabilities since interlaced video information received from conventional video tapes, cable television broadcasters (CATV), digital video disks (DVD's) and direct broadcast satellite (DBS) systems must be de-interlaced for suitable display on a progressive (non-interlaced based) display device.
A current video compression standard, known as MPEG-2 specifies the compression format and decoding format for interlaced and non-interlaced video picture information. MPEG-2 video streams have picture data divided as blocks of data. These blocks of data are referred to as macroblocks in the MPEG-2 standard. Generally, a macroblock of data is a collection of Y, Cr, Cb (color space) blocks which have common motion parameters. Therefore, a macroblock of data contains a section of the luminance component and spatially corresponding chrominance components. A macroblock of data can either refer to source, decoded data or to the corresponding coded data elements. Typically, a macroblock of data (macroblocks) consists of blocks of 16 pixels by 16 pixels of Y data and 8 by 8, or 16 by 16 pixels of Cr and Cb data in one field or frame of picture data.
Generally, in MPEG-2 systems, two fields of a frame may be coded separately to form two field pictures. Alternatively, the two fields can be coded together as a frame. This is known generally as a frame picture. Both frame pictures and field pictures may be used in a single video sequence. A picture consists of a luminance matrix Y, and two chrominance matrices (Cb and Cr).
MPEG-2 video streams also include data known motion vector data that is solely used by a decoder to efficiently decompress the encoded macroblock of data. A motion vector, referred to herein as a decoding motion vector, is a two-dimensional vector used for motion compensation that provides an offset from a coordinate position in a current picture to the coordinates in a reference picture. The decoder uses the decoding motion vector data stream to reference pixel data from frames already decoded so that more compact difference data can be sent instead of absolute data for those referenced pixels or macroblocks. In other words, the motion vector data is used to decompress the picture data in the video stream. Also, zero decoding motion vectors may indicate that there was no change is pixel data from a previously decoded picture.
In MPEG-2 video streams, decoding motion vectors are typically assigned to a high percentage of macroblocks. Macroblocks can be in either field pictures or frame pictures. When in a field picture it is field predicted. When in a frame picture, it can be field predicted and frame predicted.
A macroblock of data defined in the MPEG-2 standard includes among other things, macroblock mode data, decoding motion vector data and coded block pattern data. Macroblock mode data are bits that are analyzed for de-interlacing purposes. For example, macroblock mode data can include bits indicating whether the data is intracoded. Coded block pattern data are bits indicating which blocks are coded.
Intracoded macroblocks are blocks of data that are not temporarily predicted from a previously reconstructed picture. Non-intracoded macroblocks have a decoding motion vector(s) and are temporarily predicted from a previously reconstructed picture.
Several basic ways of de-interlacing interlaced video information include a “weave” method and a “bob” method. With the “weave”, or merge method, successive even and odd fields are merged. Each frame to be displayed is constructed by interleaving the scan lines of a pair of fields. This “weave” method is generally most effective with areas of a picture that do not have motion over successive frames because it provides more pixel data detail for non-moving objects. However, when motion does occur, artifacts appear in the form of double images of a moving object. An artifact called “Comb Tearing” or “Feathering” appears around the periphery of a horizontally moving object causing poor image quality. Images with vertically motion also have artifacts.
In contrast to the “weave” method, the “bob” method displays single fields as frames. The missing scan lines are interpolated from available lines in the filed making the frame rate the same as the original field rate. The most often used methods are line repetition, line averaging and edge-adaptive spatial interpolation. Again, this de-interlacing method is also not typically used with some form of motion detection so that non-moving images can appear to be blurry from loss of image detail. This can result from inaccurate interpolation of pixel data. The “bob” technique introduces flicker that is noticeable in video sequences with no motion. This occurs because even when the scene is static, two different frames are created—one based on the even field and one based on the odd field. These frames are generally different. Where they are different, flicker occurs as the display alternates between the two frames.
There are a number of techniques categorized as motion adaptive de-interlacing. These use different de-interlacing strategies in picture areas with and without motion. Generally, “bob” is used in picture areas with motion and “weave” is used in picture areas without motion. Additional discussion on video processing techniques can be found in a book entitled “Digital Video Processing,” written by A. Murat Tekalp and published by Prentice Hall. Often, separate de-interlacers and/or separate motion detection hardware is used to carryout the above methods. However, separate de-interlacers and motion detection hardware can add additional cost to a graphics processor.
Graphics processors are known to include 2D/3D engines that fetch data from a frame buffer and blend pixels together to render an image and place the blended data back in the frame buffer. The frame buffer is memory accessible by the graphics processor. Such graphics processors are also known to include display engines which obtain rendered images from the frame buffer and may subsequently perform simple deinterlacing operations (such as “bob” and “weave”) but do not typically rewrite the deinterlaced information back to the frame buffer. As known in the art, the specifics of operations supported by 2D/3D engines vary. Also, it is not uncommon among 2D/3D engines for the same operation to use a different number of passes on different chips. Lighting and multi-texture affects are examples of features where different implementations partition the signal processing steps differently to achieve a tradeoff between die area, complexity, memory bandwidth, and performance. The feature sets of 2D/3D engines evolve rapidly to make them more and more efficient at the tasks for which they are most frequently programmed.
The amount of signal processing (and thus the sophistication) of a deinterlacing algorithm that is implemented into a display engine will most likely lag behind an “off line” deinterlacing algorithm.
Display engine based deinterlacing solutions have less time in which to perform the needed signal processing. The deinterlaced pixels typically have to be produced at a time coincident with a display device's timing. If the deinterlaced image is displayed in a window, the deinterlaced image has to be produced during the time the portion of the display containing the window is refreshed. This means that the data fetches and signal processing operations have to occur in a shorter period of time than they would otherwise have to. For example, if the display engine's timing requirements could be ignored, a deinterlaced NTSC image could be produced every time a new field was received, or in 1/60th of a second, However, with the timing requirements of a 100 Hz refresh rate CRT, the image has to be produced in 1/100th of a second. If the video window on the display were half the height of the display, then time available is 1/200th of a second.
Display engine based deinterlacing solutions must be replicated if the display engine is required to asynchronously drive more than one display device at a time. Because there is no synchronicity (i.e. one display may be running at 85 Hz while the other is running at 100 Hz), the deinterlacing signal processing can not be shared between the displays.
Also, display engine clocks typically run very fast (350 MHz today) compared to the clocks used processing digital NTSC and PAL data (28-35 MHz). It is technically challenging to add complex signal processing hardware in a high speed digital environment.
The order in which display pixels are obtained is typically controlled by the display device which influences the way in which pixels are fetched. For a display engine to perform advanced deinterlacing that requires the inspection of more source pixels, additional data needs to be fetched above and beyond the data needed for a simple “bob” or “weave” deinterlacing and display. As memory bandwidth is precious to a high performance graphics chip, additional on-chip memories are often used to reduce the amount of refetching required, thereby increasing the cost of the device instead.
For all these reasons, the amount of signal processing (and thus the sophistication) of a deinterlacing algorithm that is implemented into a display engine will most likely lag behind an “off line” deinterlacing algorithm. Accordingly, there is a need for a graphics processor that performs non-display engine based deinterlacing.
As noted above, additional hardware that performs deinterlacing off-line, and thus at a more leisurely pace, can have a reduced signal processing requirement. However, such hardware requires expensive hardware just to give it access to the graphics chip's main memory. This hardware includes an additional arbitration channel in the memory controller. It requires additional buffers to buffer data while waiting for an opportunity to write, and to quickly burst the data when that opportunity comes. It requires additional buffers to receive high speed bursts of data for reads, an to store up data for use until the next read can occur. It requires logic to generate addresses and logic to cross clock boundaries. It may even require logic to maintain cache coherency with other hardware accessing the same memory. Therefore, while there certainly are solutions to signal processing problem using dedicated off-line hardware, there are also significant overhead costs to adding such dedicated hardware. These solutions will take time to develop and add to graphics chips. There is a need for a solution that can be incorporated in existing as well as future graphic processing chips.