The invention relates generally to deinterlacing of video data, and in particular, to an improved deinterlacing technique and apparatus.
Video is generally represented as sequences of frames in accordance with either the interlaced or the progressively-scanned format (non-interlaced). Each frame includes a matrix of pixels that vary in color and intensity according to the image displayed.
In the interlaced scan format, a frame, which is a raster array of image bytes representing an image, includes a pair of fields, in which a field is a raster array of image bytes representing every other row of a frame and are derived from two different instants. The primary field of the pair of fields, for example, is the input field associated with the time instant for which the output frame is to be constructed and includes pixels that are located only on alternate rows (either odd or even rows) of the frame matrix, called horizontal lines. The secondary field includes pixels that are located on the corresponding horizontal lines of the frame matrix which are the missing pixels in the primary field. the pixels in the secondary field represent the portions of the image not represented in the primary field. The primary and secondary fields of a frame are scanned consecutively, for example, on a video display monitor at a rate of 60 fields/sec for purposes of reconstructing the entire image on the display at the industry interlaced scan standard 30 frames/sec display rate.
In the progressively scanned format, an image is represented in its entirety using only a single field that includes pixels in all horizontal lines of the frame matrix. Therefore, such frames can be progressively scanned on a displays at the standardized progressive display rate of 60 frames/sec.
Conventional television systems receive frames of video signals in an interlaced format. For example, the National Television System Committee (NTSC) standard is used to send and receive frames of television signals at a rate of 30 frames/second. Each frame contains 525 lines of video scan lines, which are divided into two interlaced fields. The interlaced fields are transmitted at a rate of 60 fields/second, or 30 frames/second. The receiver scans the two interlace fields of each frame, one by one, to display the interlaced video signals as television pictures.
Several video applications also use interlace scanning during image origination or capture as well as during signal transmission from the encoder, which codes the video signal, to the receiver. For example, digital video compression methods, such as the ISO (International Standards Organization) MPEG (Moving Pictures Expert Group) video compression standard, may be used to reduce the data rate to a level suitable for transmission over an available digital channel.
However, the display for these digitally compressed video image sequences at the decoders may not use interlaced scanning, or it may be desirable to use non-interlaced displays. For example, in large screen television displays, multimedia displays, or computer displays that support many text-oriented or graphics-oriented applications, a non-interlaced format is often preferred over an interlaced format for a variety of reasons, such as the reduced motion artifacts and flicker associated with the progressively scanned format.
Thus, there is a need to convert an interlaced format to a non-interlaced format. The process of converting an interlaced format to a non-interlaced format is generally referred to as deinterlacing (or line-doubling). Deinterlacing can be used to convert interlaced still pictures to non-interlaced still pictures, or to provide display of interlaced video on a progressive display (non-interlaced) computer monitor.
Also, many deinterlacing systems have separate hardware for performing the deinterlacing operation. This can result in unnecessary additional costs when implemented, for example, with three-dimensional graphics processing engines or other circuitry that perform horizontal and vertical scaling.
Two existing low cost deinterlacing techniques are commonly referred to as xe2x80x9cbobxe2x80x9d and xe2x80x9cweavexe2x80x9d. For example, where a source format and final deinterlaced image are 480 lines high, split into two 240 line even and odd fields for transmission, storage, or other processing common in the video industry, these fields are sampled {fraction (1/60)}th of a second apart for NTSC style video.
Weave simply combines an even and odd field together to make one frame. Assuming 240 lines in each field, interleaving (or xe2x80x9cweavingxe2x80x9d) the even field lines into the even numbered lines of the result frame, and interleaving the odd field lines in the odd numbered lines of the result frame, produces a deinterlaced frame of 480 lines.
Weave produces a perfect visual result for still images, and maintains full resolution for vertical source image detail. However, since even and odd fields were sampled sequentially with a time difference between them (1/60th second for NTSC video), weave produces motion artifacts on moving picture elements, or on camera zooms and pans, i.e. any relative motion between the camera and any source image element.
When fields containing motion are simply woven together, there is an unnatural xe2x80x9ccombxe2x80x9d or xe2x80x9cweavexe2x80x9d artifact on edges. These effects make weave unsuitable for quality TV viewing, although it is inexpensive to implement.
Bob does not combine even and odd fields together; each displayed independently. Assuming 240 lines in each field, bob displays each field by itself for its correct {fraction (1/60)}th second interval, then displays the next field for its correct {fraction (1/60)}th second interval. Because each even or odd field only contains 240 even or odd lines and a full resolution picture contains 480 lines, each field must be upscaled vertically x2 during display. There is also a xc2xd line offset required to position the even and odd lines correctly. This produces a deinterlaced frame of 480 lines.
Bob produces an image which is temporally correct, since each field is only display correct time. However it suffers from vertical spatial aliasing This is because the original resolution of the source image frame was 480 lines vertically. To produce a 240 line field the original image must be vertically downsampled, discarding every second line. This is not too bad for natural imagery (streams and fields), because they do not contain a lot of vertical detailxe2x80x94in fact NTSC television works this way. For modern images, such as sports or newscasts with overlaid text or graphics, this downsampling produces two unacceptable artifacts.
The first is that in each field, half the information has been dropped through downsampling, causing spatial aliasing. Bob fills in the missing lines through interpolation of the lines above and below, but this is only a guessxe2x80x94the real information is not there. The result is that hard horizontal edges, such as the bottom of a square graphic image, or the crossbars on the letters xe2x80x9cTxe2x80x9d or xe2x80x9cexe2x80x9d will appear and disappear from displayed field to displayed field. They also appear slightly shifted field to field due to the xc2xd line offset mentioned above. This on/off behaviour combined with the xc2xd line shift produce a jumpiness in the image, and this gives rise to the name xe2x80x9cbobxe2x80x9d. This effect is closely related to the vertical downsampling and aliasing mentioned above. It can be alleviated somewhat by better downsampling techniques at the source (typically a television studio), but only by introducing extra fuzziness in the picture or greatly lowering its overall vertical resolutionxe2x80x94not an acceptable tradeoff.
The second effect of the missing lines is that at any one display time only half the vertical resolution is shown, and the rest is interpolated with a lowpass filter. This makes the image fuzzier than the original, again causing an unacceptable quality loss.
Also, there are many deinterlacing techniques based on motion estimation and compensation which work very well, but they are too expensive to implement in low cost equipment, such as cost reduced televisions, settop boxes, or personal computer displays.
Other deinterlacing techniques, such as those described in co-pending application entitled xe2x80x9cImproved Deinterlacing Technique,xe2x80x9d as referenced above, provide an improved deinterlacing technique by utilizing, for example, three localized input pixel values to produce an output pixel value that minimizes spatial artifacts. For example, such methods and devices typically avoid the use of fixed numerical thresholds or spatio-temporal interpolation techniques. One advantage can be a computationally efficient deinterlacing technique. Such a technique stays in weave mode whenever possible. This gives the best possible vertical resolution with no fuzziness, or other vertical aliasing artifacts When a particular set of conditions are detected in a set of three vertical pixels (from a set of even, odd, even lines, or odd, even, odd lines), it automatically and gradually transitions from weave to bob as the conditions making weave untenable increase. However, such techniques may cause a xe2x80x9cbobxe2x80x9d effect too readily. Accordingly, it would be desirable to have a method and apparatus that was computationally efficient but provided additional certainty as to whether or not vertical spatial high frequencies were present due to temporal motion or actual picture detail. In addition, it would be desirable to reduce the amount of hardware necessary, or software necessary, to implement such a deinterlacer.