Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 pictures per second. Each picture can include tens or hundreds of thousands of samples (sometimes grouped as pixels, or pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits or more. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence can be 5 million bits/second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bit rate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bit rate are more dramatic. Decompression reverses compression.
In general, video compression techniques include “intra” compression and “inter” or predictive compression. Intra compression techniques compress individual pictures. Inter compression techniques compress pictures with reference to preceding and/or following pictures.
A video frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. A typical progressive video frame consists of one frame of content with non-alternating lines. A typical interlaced video frame consists of two fields scanned starting at different times. For example, referring to FIG. 1, an interlaced video frame 100 includes top field 110 and bottom field 120. In contrast to interlaced video, progressive video does not divide video frames into separate fields, and an entire frame is scanned left to right, top to bottom starting at a single time.
In a typical interlaced video frame, the even-numbered lines (top field) are scanned starting at one time (e.g., time t) and the odd-numbered lines (bottom field) are scanned starting at a different (typically later) time (e.g., time t+1). This timing can create jagged tooth-like features in regions of an interlaced video frame where motion is present when the two fields are scanned starting at different times. For this reason, interlaced video frames can be rearranged according to a field structure, with the odd lines grouped together in one field, and the even lines grouped together in another field. This arrangement, known as field coding, is useful in high-motion video for reduction of such jagged edge artifacts. On the other hand, in stationary regions, image detail in the interlaced video frame may be more efficiently preserved without such a rearrangement. Accordingly, frame coding is often used in stationary or low-motion interlaced video frames, in which the original alternating field line arrangement is preserved. When the decision is made to use frame coding for an interlaced video frame, some encoders allow individual macroblocks to be adaptively coded using either frame coding or field coding.
Different approaches have been tried to decide when to use frame coding and when to use field coding for interlaced video frames. For example, two-pass encoding algorithms encode the same interlaced video frame in separate paths using field coding and frame coding, respectively. The field coding results and frame coding results are then compared to determine which coding mode provides better rate-distortion performance. However, because they effectively encode interlaced video frames twice, two-pass algorithms are very expensive in terms of encoding time.
One-pass encoding algorithms typically determine whether to use field or frame coding before encoding the interlaced video frame. One such algorithm looks at individual frames within a sequence to determine whether each frame should be field-coded or frame-coded. The algorithm classifies an individual macroblock as a “field” macroblock or “frame” macroblock by comparing how far individual sample values in the top field and bottom field of the macroblock deviate from the mean sample values of the respective fields. If the difference between the deviation in the top field and the deviation in the bottom field is great enough, the algorithm determines that high motion is present and classifies the macroblock as a “field” macroblock. Otherwise, the macroblock is classified as a “frame” macroblock. The algorithm chooses field coding for the frame if the majority of its macroblocks are “field” macroblocks and chooses frame coding for the frame if the majority of its macroblocks are “frame” macroblocks. This algorithm measures variance in sample values in an attempt to detect motion, but it ignores other important content characteristics in making its field/from coding decision for the frame.
To make a field or frame coding decision for an interlaced video frame, a prior Microsoft video encoder divides interlaced frames into 8×4 blocks and analyzes each block in the spatial domain. For each block, the encoder checks if the vertical intensity fluctuation is more significant than the horizontal intensity fluctuation. Specifically, suppose p(r, c) represents the luminance value of a pixel at row r and column c. The encoder measures line-to-line vertical intensity fluctuation (V) and horizontal intensity level fluctuation (H) for the 8×4 block:
      V    =                  ∑        r            ⁢                          ⁢                        ∑          c                ⁢                                                      p              ⁡                              (                                  r                  ,                  c                                )                                      -                          p              ⁡                              (                                                      r                    +                    1                                    ,                  c                                )                                                                              H      =                        ∑          r                ⁢                                  ⁢                              ∑            c                    ⁢                                          ⁢                                                                                    p                  ⁡                                      (                                          r                      ,                      c                                        )                                                  -                                  p                  ⁡                                      (                                          r                      ,                                              c                        +                        1                                                              )                                                                                      .                                ⁢                For a block to be coded as progressive video, V should have similar value as H. If V is significantly larger than H, then there is a good indication of interlace effect and the block is classified as an “interlace” block. The encoder calculates the percentage of “interlace” blocks in the frame. If the percentage is greater than a threshold, the encoder selects field mode. Otherwise, the encoder selects frame mode.
This approach tends to classify blocks with strong vertical intensity fluctuations as “interlace” blocks regardless of whether the blocks actually have jagged, tooth-like interlace artifacts that can be alleviated by coding the interlaced frame in field mode. For example, blocks with horizontal edges that are mistakenly classified as “interlace” blocks will artificially skew the encoder's field mode/frame mode decision.
Given the critical importance of video compression to digital video, it is not surprising that video compression is a richly developed field. Whatever the benefits of previous video compression techniques, however, they do not have the advantages of the following techniques and tools.