In an interlaced video, each frame of the video has two fields. One field includes all even pixel lines of the frame, and the other frame includes all odd pixel lines. Interlacing the two fields together forms the video frame. The two fields are displayed alternatively on an interlaced display for better motion continuity. The majority of consumer TV sets are interlaced display devices. Interlaced video is widely used in terrestrial video broadcasting, cable television (CATV) as well as direct broadcast satellite (DBS) systems. The current digital television broadcasting, and particularly high definition television (HDTV) mainly uses interlaced video. Typical resolutions of digital interlaced video are relatively high, e.g., 720×480 for standard definition TV (SDTV) and 1920×1080 for HDTV.
Portable terminals, including personal digital assistants (PDA) and cell phones, and computer monitors typically use progressive display. In progressive display, all pixel lines of a video frame are displayed sequentially from top to bottom. In addition, many progressive displays, PDA and cell phones in particular, have limited display capability, e.g., 320×240 is currently the high-end display for PDA, and cell phone display resolution generally is even smaller.
MPEG-2 is a video coding standard currently used by broadcasting industry. This standard is capable of efficiently representing high-resolution digital video, both interlaced and progressive.
MPEG-2 video is usually encoded using ‘frame pictures’, where the two fields are coded together. The MPEG-2 syntax also supports coding of ‘field-pictures’ where the fields are coded separately as field pictures. We use MPEG-2 frame-picture in the following descriptions, but the description also applies to field-picture.
The MPEG-2 video-coding process operates on video frames represented in the YCbCr color space. If images are stored in a 24-bit RGB format, then the images must first be converted to the YCbCr format. Each video frame is divided into non-overlapping macroblocks. Each macroblock covers a 16×16 pixels. Each macroblock includes four 8×8 luma (Y) blocks, and two corresponding 8×8 chroma blocks (one Cb block and one Cr block). Macroblocks are the basic units for motion compensated prediction (MCP), and blocks are the basic units for applying discrete cosine transform (DCT).
There are three types of frames in the MPEG-2 video: intra-frames (I-frames), predicted frames (P-frames), and bi-directional predicted frames (B-frames). An I-frame is coded independently without referring to other frames. A macroblock in an I-frame can use either frame-DCT or field-DCT. A P-frame is coded relative to a prior reference frame. A macroblock can be coded as an intra-macroblock or an inter-macroblock. An intra-macroblock is encoded like a macroblock in an I-frame.
An inter-macroblock can be frame-predicted or field-predicted. In frame-prediction, the macroblock is predicted from a block in the reference frame positioned by a motion vector. In field-prediction, the macroblock is divided into two 16×8 blocks, one block belongs to the top field, and the other block belongs to the bottom field. Each 16×8 block has a field selection bit, which specifies whether the top or the bottom field of the reference frame is used as prediction, and a motion vector, which points to the 16×8 pixel region in the appropriate field. A macroblock can be skipped when it has a zero motion vector and all-zero error terms.
A B-frame is coded relative to both a prior reference frame and a future reference frame. The encoding of a B-frame is similar to a P-frame, except that the motion vectors can refer to areas in the future reference frame.
Typically, for display on progressive portable devices, MPEG-2 coded video needs to be transcoded to a format optimized for low-resolution progressive video such as MPEG-4 simple profile (SP).
Two problems arise when MPEG-2 coded interlaced video is transcoded to a low-resolution progressive video like MPEG-4 SP, or when it is to be displayed on low-resolution progressive display. One problem is due to well-known interlacing artifacts, including aliasing, saw-tooth type edge-distortion and line flicker. The other problem is due to a resolution mismatch. De-interlacing and downsampling filtering are conventional techniques to solve the two problems.
Basic de-interlacing methods include “weave,” “bob,” “discard,” and “adaptive,” as in U.S. Pat. Nos. 4,750,057, 4,800,436, 4,881,125, 5,748,250, and 6,661,464. The “weave” method only interlaces the two fields of a frame together. The processed video has interlacing artifacts but with full resolution. The “bob” method displays every field as individual frames. Thus, the frame rate doubles, but the spatial resolution is lost in every frame. The “discard” method discards every other field, and therefore the interlacing artifacts are completely eliminated, but half of the resolution is lost and motion does not appear as fluid. The “adaptive” method combines the “weave” and “bob” methods. It performs de-interlacing only when there are interlacing artifacts, and uses the “weave” method elsewhere.
Typically, the interlacing artifacts are detected using motion information because only regions with motion need de-interlacing. Although the “adaptive” method can achieve better performance than “weave” or “bob,” the motion detection is usually computationally expensive and significantly increases the system cost. Advanced methods such as motion compensated de-interlacing methods can achieve better quality with even greater computational complexity, see U.S. Pat. Nos. 5,784,115, and 6,442,203.
To deal with the resolution mismatch, downsampling needs to be performed. Generic concatenated interpolating-decimating, as well as other more advanced methods, can be applied for this purpose, see U.S. Pat. Nos. 5,289,292, 5,335,295, 5,574,572, 6,175,659, and 6,563,964.
FIG. 1 shows one example prior art system 100. A video decoder 110 decodes a compressed interlaced video 101 and sends decoded interlaced pictures 102 to a de-interlacer 120. De-interlaced progressive pictures 103 are downsampled 130 by a downsampling filter. Finally, the de-interlaced and downsampled pictures 104 are passed on to an encoder 140, progressive display device, or other processing. Because the downsampling 130 is performed on the full-resolution de-interlaced pictures, unnecessary additional computations can be introduced.
Consequently, there exists a need for jointly performing de-interlacing and downsampling for displaying high-resolution interlaced content on low-resolution progressive display. There is also a need for an MPEG-2 de-interlacing and downsampling system that has a comparatively low computational complexity and can improve video quality cost effectively.