1. Field of the Invention
The present invention relates to digital video processing, and, more particularly, to enhanced digital video processing using a dynamically varying number of video fields such as single and dual video field processing.
2. Description of the Related Art
Most video sources such as National Television Standards Committee (NTSC) or Phase Alternation Line (PAL) cameras supply interlaced video including two fields per video frame. Such interlaced video cameras capture one half of the vertical resolution in a first field, and the other half of the vertical resolution in a second field. The second field is temporally displaced from (e.g., later in time than) the first field. Thus, the two fields provide full vertical frame resolution, but the fields are temporally spaced at one-half of the frame rate time.
The video produced by video sources compliant with the NTSC or PAL standards also has a particular size. For example, the CCIR601 standard defines NTSC video having a size of 720 pixels by 480 lines, and PAL video having a size of 720 pixels by 576 lines.
Although most consumer video is interlaced, computer systems typically use noninterlaced, progressive scan, displays. Moreover, several commonly used communications and/or compression standards require that the video signal be transmitted in a progressively scanned format as opposed to an interlaced format, and at a common interface format size as opposed to the NTSC size or the PAL size. For example, the International Telecommunications Union (ITU) video compression standard ITU-T H.261 for video teleconferencing systems using ISDN lines, and the standard ITU-T H.263 for multimedia communications systems using conventional phone lines both provide for progressive scan video transmission at a 352 pixels by 288 lines common interface format. Other resolutions are supported by other standards.
Consequently, some form of size conversion and some form of interlaced-to-noninterlaced conversion are required. Size conversion is called scaling. Interlaced-to-noninterlaced conversion is called deinterlacing or progressive scan conversion. Common forms of deinterlacing include single field deinterlacing such as scan line duplication, scan line interpolation and dual field deinterlacing such as field merging.
Single field conversion is preferred for motion images, whereas dual field conversion is preferred for static images. Generally, scenes with high levels of motion content display fewer motion artifacts, and a higher quality motion image, when processed from one field, and scenes with low levels of motion content display higher vertical resolution when encoded from two fields.
Typically, a codec's encoder scales either one or both fields of a received video frame into a size appropriate for the video standard or proprietary format being implemented. The scaled frame is typically subdivided into arrays of pixels. Some video compression standards such as the aforementioned H.261 and H.263 define certain kinds of arrays as macroblocks. Such macroblocks are encoded from a progressive scan format and transmitted for decode on a receiving codec.
Many existing codecs are designed to exclusively scale either one field or two fields. Each implementation includes its inherent advantages and disadvantages. For example, codecs that scale both fields of an input video frame produce images which include a higher degree of vertical resolution and manifest a high quality static image. However, scaling from both fields inherently causes field-related motion artifacts. Conversely, codecs that scale from a single field produce images which do not include field related motion artifacts and therefore manifest a higher quality motion image. Unlike dual fields, a single field is temporally coherent and therefore does not have the field related motion artifacts. However, scaling from a single field produces images having an inherently lower quality static image than codecs scaling from both fields. Thus, a single-field-processing codec favors higher quality motion images, and a dual-field-processing codec favors higher quality low-motion images.
One proposal uses field merging for still areas of the picture and scan line interpolation for areas of movement. Such a solution is disclosed in Keith Jack, "Video Demystified," (2nd ed. 1996) (hereinafter, Jack), which is incorporated herein by reference. As disclosed in Jack, motion is detected on a pixel-by-pixel basis over the entire picture in real time. Motion is detected by comparing the luminance value of a pixel with the value two fields earlier. Since two fields are combined, and either or both may contain areas of motion, Jack discloses detecting motion between two odd fields and two even fields. Four field stores are therefore required.
The pixel differences may have any value, from 0 (no movement and noise-free) to maximum (for example, a change from full intensity to black). A choice must be made when to use a pixel from the previous field (which may be in the wrong location due to motion) or to interpolate a new pixel from adjacent scan lines in the current field. Jack teaches the use of crossfading (also called soft switching) between methods. At some magnitude of pixel difference, the loss of resolution due to a double image is equal to the loss of resolution due to interpolation. That amount of motion should result in the crossfader being at the 50/50 point. Less motion will result in a fade towards field merging and more motion in a fade towards the interpolated values. Such crossfading is performed on a pixel-by-pixel basis and is very compute intensive. Consequently, crossfading is costly to implement in terms of computing resources required to implement it and which are therefore unavailable to other applications as a result, and in terms of time for computations required.