A video encoder can be used to encode one or more frames of an image sequence into digital information. This digital information may then be transmitted to a receiver, where the image or the image sequence can then be reconstructed. The transmission channel itself may be over any of a number of mediums; for example (for illustrative purposes only and not meant to be an exclusive listing), the medium may comprise a wireless broadcast, a radio link, a satellite link, coaxial cable television or data, fiber optic, mobile phone connection, or fixed line telephone link, the Internet, or a combination of these or other mediums.
Various international standards have been agreed upon for video encoding and transmission. In general, a standard provides rules for compressing and encoding data relating to frames of an image. These rules provide a way of compressing and encoding image data to transmit less data than the viewing camera originally provided about the image. This reduced volume of data then requires less channel bandwidth for transmission. A receiver can re-construct the image from the transmitted data if it knows the rules (that is, the standard) that the transmitted data conformed to. The H. 264 standard avoids redundant transmission of parts of the image, by using motion compensated prediction of macroblocks from previous frames.
Video compression architectures and standards, such as MPEG-2 and JVT/H.264/MPEG 4 Part10/AVC, encode macroblocks using only either an intraframe (“intra”) coding or an interframe (“inter”) coding method for the encoding of each macroblock. For interframe motion estimation/compensation, a video frame to be encoded is partitioned into non-overlapping rectangular, or most commonly, square blocks of pixels. For each of these blocks, the best matching same-shaped block is searched from a reference frame in a predetermined search window according to a specified matching error criterion. Then the matched block is used to predict the current block, and the prediction error block is further processed and transmitted to the decoder. The relative shifts in the horizontal and vertical directions of the reference block with respect to the original block are grouped and referred to as the motion vector (MV) of the original block, which is also transmitted to the decoder. The main aim of motion estimation is to predict a motion vector such that the difference block obtained from taking a difference of the reference and current blocks produces the lowest number of bits in encoding.
Recent video coding standards and architectures employ multiple reference pictures for motion estimation and compensation in an attempt to improve coding efficiency. Predictively coded pictures (called “P” pictures) in. MPEG-2 and its predecessors use only one previous picture to predict the values in a current picture. The H.264 standard allows the usage of multiple reference pictures (or frames), which are usually pictures at different time instants, many of which can be spatially and temporally unrelated to the current picture. In MPEG-2 only a single reference index is used, while for the encoding of motion vectors a special code named the f-code parameter is also transmitted within the bitstream for every picture that is used for the determination and decoding of the motion vectors. This f-code parameter is derived during the motion estimation process, and affects the VLC coding of the motion vectors. Previous proposals for automatically adapting the f-code parameter for every picture, depending upon its motion parameters and range, could achieve better coding efficiency, when compared to keeping the parameter fixed. H.264, does not support this parameter, and essentially uses predefined VLC codes for the encoding of the motion vectors. On the other hand, H.264 allows multiple reference use, therefore a reference index parameter is also transmitted.
The use of multiple references can increase considerably the complexity of the encoder, since more pictures need to be examined during the motion estimation process. The H.264 standard allows an encoder to select, for motion compensation purposes, among a larger number of pictures that have been decoded and stored in the decoder. The same extension of referencing capability is also applied to motion-compensated bi-prediction, which is restricted in MPEG-2 to using two specific pictures only (one of these being the previous intra (I) or P picture in display order and the other being the next I or P picture in display order).
Typically, the encoder calculates appropriate motion vectors and other data elements represented in the video data stream. The process for inter prediction of a macroblock in the encoder can involve the selection of the picture to be used as the reference picture from a number of stored previously decoded pictures. A “reference index” specifies the location (index) in a reference picture list (list 0 or list 1) of the reference picture to be used for prediction of an inter coded macroblock. A “reference index” is an index of a list of variables (PicNum and LongTermPicNum) that identify selected pictures for a frame sequence, which is called a reference picture list. When decoding a P or SP slice, there is a single reference picture list RefPicList0. When decoding a B slice, there is a second independent reference picture list RefPicList1 in addition to RefPicList0. Which pictures are actually located in each reference picture list is an issue of the multi-picture buffer control. A picture can be marked as “unused for reference” by the sliding window reference picture marking process, a first-in, first-out mechanism, and thereafter will not be listed in either of the reference picture lists. The H.264 standard allows reordering of the references within reference lists.
Multiple reference-picture motion-compensated prediction requires both encoder and decoder to store the reference pictures used for inter prediction in a multi-picture buffer. The decoder replicates the multi-picture buffer of the encoder according to memory management control operations specified in the bitstream. If the size of the multi-picture buffer is set to one picture and if the maximum reference index for list0 or list1 is not signaled to be equal to one, the reference index at which the reference picture is located inside the multi-picture buffer has to be signaled with each inter coded macroblock transmitted.
Because the reference index must to be signaled within the bitstream, for every inter coded microblock, or macroblock partition (e.g., subblocks of 16×8, 8×16 or 8×8 pixels) when the size of the reference picture list is larger than one picture, it is not always certain that multiple references will increase compression gain in the encoding of a particular picture (e.g., a picture may be biased towards only a single reference). For an inter coded macroblock (or subblock), one motion vector difference and one reference index may be present in the bitstream. For a Bi-predictively inter coded macroblock (or subblock), two motion vector differences and two reference indices may be present in the bitstream. Considering for example that for each macroblock in H.264 it is possible to transmit up to 4 reference indices for Predictive (P) pictures, and 8 for Bi-directionally (B) predictive pictures, the bitrate overhead due to the reference indices can be quite significant.
In H.264, the number of references is controlled through the num_ref_idx_IN_active_minus1 parameter that is signaled at the slice level, wherein N is equal to 0 for list0 and 1 for list1. The num_ref_idx_IN_active_minus1 parameter specifies the maximum reference index for reference picture list N that shall be used to decode each slice of the picture in which list N is used (e.g., num_ref_idx_I0_active_minus1 specifies the maximum reference index for reference picture list 0 that shall be used to decode the slice). The value of num_reg_idx_IN_active_minus1 ranges between 0 and 31, inclusive. If this parameter is equal to 0, then for the current slice, only one reference picture will be used to inter code the macroblocks in that slice and no reference index needs to be transmitted with the inter coded macroblocks of that slice.