Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 pictures per second. Each picture can include tens or hundreds of thousands of pixels (also called pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel with 24 bits or more. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence can be 5 million bits/second or more.
Most computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression can be lossless, in which quality of the video does not suffer but decreases in bit rate are limited by the complexity of the video. Or, compression can be lossy, in which quality of the video suffers but decreases in bit rate are more dramatic. Decompression reverses compression.
In general, video compression techniques include “intra” compression and “inter” or predictive compression. Intra compression techniques compress individual pictures, typically called I-frames or key frames for progressive video frames. Inter compression techniques compress frames with reference to preceding and/or following frames, and inter-compressed frames are typically called predicted frames, P-frames, or B-frames.
I. Interlaced Video and Progressive Video
A video frame contains lines of spatial information of a video signal. For progressive video, these lines contain samples starting from one time instant and continuing through successive lines to the bottom of the frame. A progressive I-frame is an intra-coded progressive video frame. A progressive P-frame is a progressive video frame coded using forward prediction, and a progressive B-frame is a progressive video frame coded using bidirectional prediction.
A typical interlaced video frame consists of two fields scanned starting at different times. For example, referring to FIG. 1, an interlaced video frame 100 includes top field 110 and bottom field 120. Typically, the even-numbered lines (top field) are scanned starting at one time (e.g., time t) and the odd-numbered lines (bottom field) are scanned starting at a different (typically later) time (e.g., time t+1). This timing can create jagged tooth-like features in regions of an interlaced video frame where motion is present when the two fields are scanned starting at different times. For this reason, interlaced video frames can be rearranged according to a field structure, with the odd lines grouped together in one field, and the even lines grouped together in another field. This arrangement, known as field coding, is useful in high-motion pictures for reduction of such jagged edge artifacts. On the other hand, in stationary regions, image detail in the interlaced video frame may be more efficiently preserved without such a rearrangement. Accordingly, frame coding is often used in stationary or low-motion interlaced video frames, in which the original alternating field line arrangement is preserved.
A typical progressive video frame consists of one frame of content with non-alternating lines. In contrast to interlaced video, progressive video does not divide video frames into separate fields, and an entire frame is scanned left to right, top to bottom starting at a single time.
II. Display Ordering and Pull-Down
The order in which decoded pictures are displayed is called the display order. The order in which the pictures are transmitted and decoded is called the coded order. The coded order is the same as the display order if there are no B-frames in the sequence. However, if B-frames are present, the coded order may not be the same as the display order because B-frames typically use temporally future reference frames as well as temporally past reference frames.
Pull-down is a process where video frame rate is artificially increased through repeated display of the same decoded frames or fields in a video sequence. Pull-down is typically performed in conversions from film to video or vice versa, or in conversions between video formats having different frame rates. For example, pull-down is performed when 24-frame-per-second film is converted to 30-frame-per-second or 60-frame-per-second video.
III. Standards for Video Compression and Decompression
Several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.261, H.262 (another title for MPEG 2), H.263 and H.264 (also called JVT/AVC) standards from the International Telecommunication Union [“ITU”]. These standards specify aspects of video decoders and formats for compressed video information. Directly or by implication, they also specify certain encoder details, but other encoder details are not specified. These standards use (or support the use of) different combinations of intraframe and interframe decompression and compression.
A. Signaling for Field Ordering and Field/Frame Repetition in the Standards
Some international standards describe bitstream elements for signaling field display order and for signaling whether certain fields or frames are to be repeated during display. The H.262 standard uses picture coding extension elements top_field_first and repeat_first_field to indicate field display order and field display repetition. When the sequence extension syntax element progressive_sequence is set to 1 (indicating the coded video sequence contains only progressive frames), top_field_first and repeat_first_field indicate how many times a reconstructed frame is to be output (i.e., once, twice or three times) by an H.262 decoder. When progressive_sequence is 0 (indicating the coded video sequence many contain progressive or interlaced frames (frame-coded or field-coded)), top_field_first indicates which field of a reconstructed frame the decoder outputs first, and repeat_first_field indicates whether the first field in the frame is to be repeated in the output of the decoder.
The MPEG 4 standard describes a top_field_first element for indicating field display order. In MPEG 4, top_field_first is a video object plane syntax element that indicates which field (top or bottom) of a reconstructed video object plane the decoder outputs first.
According to draft JVT-d157 of the JVT/AVC video standard, the slice header element pic_structure takes on one of five values to identify a picture as being one of five types: progressive frame, top field, bottom field, interlaced frame with top field first in time, or interlaced frame with bottom field first in time.
B. Limitations of the Standards
These international standards are limited in that they do not allow for signaling to indicate the presence or absence of bitstream elements for (1) signaling field display order and (2) signaling whether certain fields or frames are to be repeated during display. For example, although the H.262 standard uses picture coding extension elements top_field_first and repeat_first_field, the H.262 standard does not have a mechanism to “turn off” such elements when they are not needed.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.