Video compression is used in many current and emerging products. It is at the heart of digital television set-top boxes (STBs), digital satellite systems (DSSs), high definition television (HDTV) decoders, digital versatile disk (DVD) players, video conferencing, Internet video and multimedia content, and other digital video applications. Without video compression, digital video content can be extremely large, making it difficult or even impossible for the digital video content to be efficiently stored, transmitted, or viewed.
The digital video content comprises a stream of pictures that can be displayed as an image on a television receiver, computer monitor, or some other electronic device capable of displaying digital video content. A picture that is displayed in time before a particular picture is in the “forward direction” in relation to the particular picture. Likewise, a picture that is displayed in time after a particular picture is in the “backward direction” in relation to the particular picture.
Video compression is accomplished in a video encoding, or coding, process in which each picture is encoded as either a frame or as two fields. Each frame comprises a number of lines of spatial information. For example, a typical frame contains 480 horizontal lines. Each field contains half the number of lines in the frame. For example, if the frame comprises 480 horizontal lines, each field comprises 240 horizontal lines. In a typical configuration, one of the fields comprises the odd numbered lines in the frame and the other field comprises the even numbered lines in the frame. The field that comprises the odd numbered lines will be referred to as the “top” field hereafter and in the appended claims, unless otherwise specifically denoted. Likewise, the field that comprises the even numbered lines will be referred to as the “bottom” field hereafter and in the appended claims, unless otherwise specifically denoted. The two fields can be interlaced together to form an interlaced frame.
The general idea behind video coding is to remove data from the digital video content that is “non-essential.” The decreased amount of data then requires less bandwidth for broadcast or transmission. After the compressed video data has been transmitted, it must be decoded, or decompressed. In this process, the transmitted video data is processed to generate approximation data that is substituted into the video data to replace the “non-essential” data that was removed in the coding process.
Video coding transforms the digital video content into a compressed form that can be stored using less space and transmitted using less bandwidth than uncompressed digital video content. It does so by taking advantage of temporal and spatial redundancies in the pictures of the video content. The digital video content can be stored in a storage medium such as a hard drive, DVD, or some other non-volatile storage unit.
There are numerous video coding methods that compress the digital video content. Consequently, video coding standards have been developed to standardize the various video coding methods so that the compressed digital video content is rendered in formats that a majority of video encoders and decoders can recognize. For example, the Motion Picture Experts Group (MPEG) and International Telecommunication Union (ITU-T) have developed video coding standards that are in wide use. Examples of these standards include the MPEG-1, MPEG-2, MPEG-4, ITU-T H261, and ITU-T H263 standards.
Most modern video coding standards, such as those developed by MPEG and ITU-T, are based in part on a temporal prediction with motion compensation (MC) algorithm. Temporal prediction with motion compensation is used to remove temporal redundancy between successive pictures in a digital video broadcast.
The temporal prediction with motion compensation algorithm typically utilizes one or two reference pictures to encode a particular picture. A reference picture is a picture that has already been encoded. By comparing the particular picture that is to be encoded with one of the reference pictures, the temporal prediction with motion compensation algorithm can take advantage of the temporal redundancy that exists between the reference picture and the particular picture that is to be encoded and encode the picture with a higher amount of compression than if the picture were encoded without using the temporal prediction with motion compensation algorithm. One of the reference pictures may be in the backward direction in relation to the particular picture that is to be encoded. The other reference picture is in the forward direction in relation to the particular picture that is to be encoded.
However, as the demand for higher resolutions, more complex graphical content, and faster transmission time increases, so does the need for better video compression methods. To this end, a new video coding standard is currently being developed. This new video coding standard is called the MPEG-4 Part 10 AVC/H.264 standard.
The new MPEG-4 Part 10 AVC/H.264 standard calls for a number of new methods in video compression. For example, one of the features of the new MPEG-4 Part 10 AVC/H.264 standard is that it allows multiple reference pictures, instead of just two reference pictures. The use of multiple reference pictures improves the performance of the temporal prediction with motion compensation algorithm by allowing the encoder to find the reference picture that most closely matches the picture that is to be encoded. By using the reference picture in the coding process that most closely matches the picture that is to be encoded, the greatest amount of compression is possible in the encoding of the picture. The reference pictures are stored in frame and/or field buffers.
As previously stated, the encoder can encode a picture as a frame or as two fields. A greater degree of compression could be accomplished if, in a sequence of pictures that is to be encoded, some of the pictures are encoded as frames and some of the pictures are encoded as fields.