Digital video consumes large amounts of storage and transmission capacity. A typical raw digital video sequence includes 15 or 30 frames per second. Each frame can include tens or hundreds of thousands of pixels (also called pels). Each pixel represents a tiny element of the picture. In raw form, a computer commonly represents a pixel as a set of three samples totaling 24 bits. Thus, the number of bits per second, or bit rate, of a typical raw digital video sequence can be 5 million bits/second or more.
Many computers and computer networks lack the resources to process raw digital video. For this reason, engineers use compression (also called coding or encoding) to reduce the bit rate of digital video. Compression decreases the cost of storing and transmitting video by converting the video into a lower bit rate form. Decompression (also called decoding) reconstructs a version of the original video from the compressed form. A “codec” is an encoder/decoder system. Compression can be lossless, in which quality of the video does not suffer but decreases in bit rate are limited by the inherent amount of variability (sometimes called entropy) of the video data. Or, compression can be lossy, in which quality of the video suffers but achievable decreases in bit rate are more dramatic. Lossy compression is often used in conjunction with lossless compression—in a system design in which the lossy compression establishes an approximation of information and lossless compression techniques are applied to represent the approximation.
In general, video compression techniques include “intra-picture” compression and “inter-picture” compression, where a picture is, for example, a progressively scanned video frame. For progressive video frames, intra-frame compression techniques compress individual frames (typically called I-frames or key frames). Inter-frame compression techniques compress frames (typically called predicted frames, P-frames, or B-frames for bidirectional prediction) with reference to preceding and/or following frames (typically called reference or anchor frames).
I. Interlaced Video and Progressive Video
A typical interlaced video frame consists of two fields scanned starting at different times. For example, an interlaced video frame includes a top field and a bottom field. Typically, the even-numbered lines (top field) are scanned starting at one time (e.g., time t) and the odd-numbered lines (bottom field) are scanned starting at a different (typically later) time (e.g., time t+1). This timing can create jagged tooth-like features in regions of an interlaced video frame where motion is present because the two fields are scanned starting at different times. For this reason, interlaced video frames can be rearranged for coding according to a field structure, with the odd lines grouped together for coding as one field, and the even lines grouped together for coding as another field. This arrangement, known as field coding, is useful in high-motion pictures for reduction of such jagged edge artifacts. Fields in different field-coded interlaced frames can be coded differently. For example, a field in a field-coded interlaced frame can be intra-coded (e.g., an interlaced I-field) or inter-coded (e.g., an interlaced P-field or interlaced B-field).
On the other hand, in stationary regions, image detail in the interlaced video frame may be more efficiently preserved without such a coding rearrangement. Accordingly, frame coding is often used in stationary or low-motion interlaced video frames, in which the original alternating field line arrangement is preserved. Different frame-coded interlaced frames also can be coded differently. For example, such frames can be intra-coded (e.g., an interlaced I-frame) or inter-coded (e.g., an interlaced P-frame or interlaced B-frame).
A typical progressive video frame consists of one frame of content with non-alternating lines. In contrast to interlaced video, progressive video does not divide video frames into separate fields, and an entire frame is scanned left to right, top to bottom starting at a single time. Progressive frames can be intra-coded (e.g., a progressive I-frame) or inter-coded (e.g., a progressive P-frame or progressive B-frame).
II. Standards for Video Compression and Decompression
Several international standards relate to video compression and decompression. These standards include the Motion Picture Experts Group [“MPEG”] 1, 2, and 4 standards and the H.261, H.262 (another title for MPEG 2), H.263 and H.264 (also called JVT/AVC) standards from the International Telecommunication Union [“ITU”]. These standards specify aspects of video decoders and formats for compressed video information. Directly or by implication, they also specify certain encoder details, but other encoder details are not specified. These standards use (or support the use of) different combinations of intraframe and interframe decompression and compression. In particular, they use or support the use of different “access points” for decoders and/or editors.
The MPEG 2/H.262 standard describes intra-coded pictures (e.g., coded I-frames) and group-of-pictures (GOP) headers. In MPEG 2, intra-coded pictures are coded without reference to other pictures and provide access points to the coded sequence where decoding can begin. Intra-coded pictures can be used at different places in a video sequence. For example, intra-coded pictures can be inserted periodically or can be used in places such as scene changes or where motion compensation is otherwise ineffective. A coded I-frame is an I-frame picture or a pair of field pictures, where the first field picture is an I-picture and the second field picture is an I-picture or a P-picture. The MPEG 2 standard does not allow a coded I-frame in which the first field picture is a P-picture and the second field picture is an I-picture.
A GOP header is a construct in the MPEG 2 bitstream that signals the beginning of a group of pictures. Groups of pictures are typically used to signal the boundary of a set of video frames/fields all encoded with reference to the same I-frame. A GOP header is an optional header that may be signaled immediately before a coded I-frame to indicate if the first consecutive B-pictures (if any) immediately following the coded I-frame in the bitstream (but typically preceding the coded I-frame in display order) can be reconstructed properly in the case of a random access. For such B-pictures, if a reference picture before the current coded I-frame is not available, the B-pictures cannot be reconstructed properly unless they only use backward prediction from the current coded I-frame or intra coding. A decoder may use this information to avoid displaying B-pictures that cannot be correctly decoded. For a decoder, the GOP header thus indicates how the decoder can perform decoding from the GOP header, even if the GOP header is in the middle of a video sequence. The GOP header includes a start code called group_start_code. The GOP header start code includes a 24-bit start code prefix (23 0s followed by a 1) followed by the GOP header start code value (B8 in hexadecimal). Start codes in MPEG 2 are byte-aligned; 0s are to be inserted before the beginning of the start code prefix to ensure byte alignment. For additional information, see the H.262 standard.
The MPEG 4 standard describes intra-coded video object planes (I-VOPs) and group of video object plane (VOP) headers. An I-VOP is a VOP coded using information only from itself. Non-intra-coded VOPs may be derived from progressive or interlaced frames. In MPEG 4, I-VOPs are coded without reference to other pictures and provide access points to the coded sequence where decoding can begin. A group of VOP header is an optional header that can be used immediately before a coded I-VOP to indicate to the decoder if the first consecutive B-VOPs immediately following the coded I-frame can be reconstructed properly in the case of a random access. A group of VOP header must be followed by a coded I-VOP. A group of VOPs start code includes a 24-bit start code prefix (23 0s followed by a 1) followed by the group of VOPs start code value (B3 in hexadecimal). Start codes in MPEG 4 are byte-aligned and the standard provides for bit-stuffing to achieve byte alignment. For example, for stuffing from one to eight bits, a 0 followed by from one to seven is are inserted prior to the start code, so long as the previous code was not a start code. For additional information, see the MPEG 4 standard.
According to draft JVT-d157 of the JVT/AVC video standard, I-pictures provide access points to a coded sequence where decoding can begin, and various information used in decoding is signaled in network abstraction layer (“NAL”) units. A NAL unit indicates what type of data to expect in the NAL unit, followed by the data itself, interspersed with emulation prevention data. A supplemental enhancement information (“SEI”) NAL unit is a type of NAL unit. An SEI NAL unit contains one or more SEI messages. Each SEI message consists of SEI header and SEI payload. The type and size of the SEI payload are coded using an extensible syntax. The SEI payload may have a SEI payload header. For example, a payload header may indicate to which picture the particular data belongs.
Annex C of draft JVT-d157 establishes rules for dealing with hypothetical reference decoder (“HRD”) buffers. For example, at each decoder refresh point a buffering period SEI message shall follow the last NAL unit of the last picture before a decoder refresh and precede the first NAL unit of the first picture after the decoder refresh. An HRD picture SEI message must follow the last NAL unit of each picture and precede the first NAL unit of the next picture. Each of these SEI messages pertains to the picture that follows it.
Annex D of the draft JVT-d157 describes a syntax for a random access point SEI message. A random access point SEI message contains an indicator of a random access entry point for a decoder. The entry point is indicated as a count relative to the position of the SEI message in units of coded frame numbers prior to the frame number of the current picture. Annex D states that a buffering period SEI message should be transmitted at the location of the random access entry point indicated in the random access point SEI message in order to establish initialization of the HRD buffer model.
These international standards are limited in several important ways. For example, in MPEG 2, the first coded frame after a GOP header must be a “coded I-frame”—an intra-coded frame picture or a pair of field pictures where the first field picture is an I-picture and the second field picture is either an I-picture or a P-picture. GOP headers are not allowed to precede any other frame type. In MPEG 4, a group of VOP header must be followed by a coded I-VOP.
Given the critical importance of video compression and decompression to digital video, it is not surprising that video compression and decompression are richly developed fields. Whatever the benefits of previous video compression and decompression techniques, however, they do not have the advantages of the following techniques and tools.