The present invention relates to the field of digital video compression. More specifically, the present invention relates to methods and apparatus for dynamically adjusting f-codes for a digital picture header, depending on the motion vector range required for each picture.
Digital television offers viewers high quality video entertainment with features such as pay-per-view, electronic program guides, video-on-demand, weather and stock information, as well as Internet access. Video images, packaged in an information stream, are transmitted to the user via a broadband communication network over a satellite, cable, or terrestrial transmission medium. Due to bandwidth and power limitations, efficient transmission of film and video demands that compression and formatting techniques be extensively used. Protocols developed by the Motion Pictures Experts Group (MPEG), such as MPEG-2, attempt to maximize bandwidth utilization for film and video information transmission by adding a temporal component to a spatial compression algorithm.
The video portion of the television signal comprises a sequence of video “frames” that together provide a moving picture. In digital television systems, each line of a video frame is defined by a sequence of digital data bits, or pixels. Each video frame is made up of two fields, each of which contains one half of the lines of the frame. For example, a first or odd field will contain all the odd numbered lines of a video frame, while a second or even field will contain the even numbered lines of that video frame. A large amount of data is required to define each video frame of a television signal. For example, 7.4 megabits of data is required to provide one video frame of a National Television Standards Committee (NTSC) television signal. This assumes a 640 pixel by 480 line display is used with 8 bits of intensity value for each of the primary colors red, green, and blue. High definition television requires substantially more data to provide each video frame. In order to manage this amount of data, the data must be compressed.
Digital video compression techniques enable the efficient transmission of digital video signals over conventional communication channels. Such techniques use compression algorithms that take advantage of the correlation among adjacent pixels in order to derive a more efficient representation of the important information in a video signal. The most powerful compression systems not only take advantage of spatial correlation, but can also utilize similarities among adjacent frames to further compact the data. In such systems, motion compensation (also known as differential encoding) is used to transmit only the difference between an actual frame and a prediction of an actual frame. The prediction is derived from a previous (or future) frame of the same video sequence. In such motion compensation systems, motion vectors are derived, for example, by comparing a block of pixel data from a current frame to similar blocks of data in a previous frame. A motion estimator determines how a block of data from the previous frame should be adjusted in order to be used in the current frame.
Video compression standards, such as MPEG-2, provide for compression of video data by sending only the changes between different video frames. A first type of frame, known as a predictive coded frame or “P” frame (also referred to herein as a P-picture), contains an abridged set of data used by the decoder to predict a full frame from a previous “P” frame or from a previous complete frame (an intra-coded “I” frame or I-picture) in the video stream. The stream merely carries “fine tuning” information to correct errors from an approximate prediction. An I-frame is compressed without motion prediction. Thus, a full video frame can be reconstructed from an I-frame without reference to any other frame. In this manner, errors due to DCT/IDCT mismatches will be eliminated once an I-frame arrives and is decoded. Bi-directional predictive coded frames (a “B” frame or B-picture) are similar to P-frames, except that the prediction is made not only from the previous I or P-frame, but also from a future frame (typically the next frame). MPEG data streams encoded in this manner are referred to herein as “I-frame based MPEG data streams.” An I-frame based MPEG data stream may start with an optional Group-of-Pictures (GOP) header followed by an I-frame. The video frame can be reconstructed from the GOP without reference to other frame information.
In the MPEG-2 format, video information is digitized and compressed before being encoded. The compression can be considered part of the encoding. As shown in FIG. 1, compressed video from a program 100 is divided into variable-length units called Packetized Elementary Stream (PES) packets, such as PES packets 105 and 110, each of which contains a variable number of encoded pictures. For example, the PES packet 105 includes encoded pictures 119, 121, . . . , 124.
The example PES packet 105 has a header 116 and a payload portion 117. Moreover, each picture in the PES packet 105 is prefixed by a picture header containing information about the picture. For example, the picture 119 has a picture header 118, the picture 121 has a picture header 120, and the picture 124 has a picture header 123.
For transmission and storage purposes, PES packets are further broken down into fixed-length units called transport packets. Each transport packet is formed by subdividing the contents of successive portions of a PES packet. With the MPEG-2 standard, each transport packets comprises 188 bytes. Generally, the PES packet length is much larger than the size of a transport packet. Each transport packet has a transport packet header and a payload portion.
An f-code is a code carried in the digital picture header (e.g., picture header 118, 120, and 123 of FIG. 1) of a compressed video stream (such as an MPEG-2 encoded video stream). The f-code defines the search range within a frame or field for the motion vectors used to decode the picture (e.g., a frame or field of video). A P-picture requires only forward horizontal and forward vertical motion vectors, such that only corresponding “forward” f-codes need to be determined, while a B-picture requires forward horizontal, forward vertical, backward horizontal, and backward vertical motion vectors and corresponding f-codes. As an example, FIG. 1 shows picture header 118 containing a forward f-code 130 and a backward f-code 132.
The value of the f-codes for a picture are normally determined prior to the start of encoding that picture. Demands for lower bit-rates and higher video quality require efficient use of available bandwidth. Sending an f-code larger than needed for the current picture wastes bits that could be used to provide better video quality.
As described in ISO/IEC JTC1/SC29/WG11/N0400 (MPEG-2) “Test Model 5” (TM5), April 1993, which is incorporated herein and made a part hereof by reference, encoding each motion vector having a non-zero motion code requires a motion residual which uses f-code −1 bits. Hence, reducing one f-code by one results in savings of as much as much as 1 bit per motion vector. A fall resolution NTSC picture has 1350 macroblocks. Each macroblock may have from zero to four motion vectors. Therefore, reducing f-codes to a value that is only as large as needed for the current picture can result in substantial bit savings in encoding that picture. For example, the maximum savings achieved by reducing all f-codes used to encode a B-picture by one is 5,400 bits.
It would be advantageous to provide methods and apparatus for adjusting (i.e. minimizing) f-codes in a digital picture header, depending on the maximum motion vector range required for each picture. It would be further advantageous to reduce f-codes in a digital picture header so that such f-codes were only as large as necessary to allow decoding of the picture, thereby resulting in bit savings when encoding the picture.
The methods and apparatus of the present invention provide the foregoing and other advantages.