Video signals can be digitized, encoded, and subsequently decoded in a manner which significantly decreases the number of bits necessary to represent a decoded reconstructed video without noticeable, or with acceptable, degradation in the reconstructed video. Video coding is an important part of many applications such as digital television transmission, video conferencing, video database, etc.
In video conferencing applications, for example, a video camera is typically used to capture a series of images of a target such as a meeting participant or a document. The series of images is encoded as a data stream and transmitted over a communications channel to a remote location. For example, the data stream may be transmitted over a phone line, an integrated services digital network (ISDN) line, or the Internet.
The encoding process is typically implemented using a digital video coder/decoder (codec), which divides the images into blocks and compresses the blocks according to a video compression standard, such as the ITU-T H.263 and H.261 standards. In standards of this type, a block may be compressed independent of the previous image or as a difference between the block and part of the previous image. In a typical video conferencing system, the data stream is received at a remote location, where it is decoded into a series of images, which may be viewed at the remote location. Depending on the equipment used, this process typically occurs at a rate of one to thirty frames per second.
One technique widely used in video systems is hybrid video coding. An efficient hybrid video coding system is based on the ITU-T Recommendation H.263. The ITU-T Recommendation H.263 adopts a hybrid scheme of motion-compensated prediction to exploit temporal redundancy and transform coding using the discrete cosine transform (DCT) of the remaining signal to reduce spatial redundancy. Half pixel precision is used for the motion compensation, and variable length coding is used for the symbol representation.
With the above-mentioned coding algorithm, fifteen negotiable coding options can be used, either together or separately. The motion compensation (MC) is often carried out by employing the immediately preceding image which is available as a reconstructed image at encoder and decoder. While long-term statistical dependencies in the coded video sequence have not been fully exploited in existing international standards for improving coding efficiency, a specified negotiable coding option called "Reference Picture Selection Mode" (RPS mode) permits a modified inter-picture prediction called "NEWPRED". This prediction is intended to stop temporal error propagation due to errors. Transmission errors cause different results of the decoding process at encoder and decoder, thereby leading to differently and inconsistently reconstructed frames. The RPS mode can use backward channel messages sent from a decoder to an encoder to inform the encoder which part of which pictures have been correctly decoded at the decoder. The encoder may select one of the picture memories to suppress the temporal error propagation due to the inter-frame coding. A particular picture memory is selected as reference for inter-frame coding of a complete picture, which is represented as a "group of blocks" or a "slice" as specified in the H.263 document. The amount of additional picture memory accommodated in the decoder may be signaled by external means as specified in the ITU-T Recommendations.
The RPS mode is designed to suppress the temporal error propagation due to the inner-frame coding which occurs in case of transmission errors. Techniques to use multiple reference picture in order to achieve the additional goal of improving coding efficiency are being analyzed within the MPEG-4 standardization group. These techniques include schemes known as "Sprites," "Global Motion Compensation," "Short-Term Frame Memory/Long-Term Frame Memory" and "Background Memory" prediction. A commonality of these techniques is that the video encoder can choose between the immediately preceding reconstructed picture and a second picture, either being generated by the prediction technique. While the use of more than a second picture has been exploited when combining various ones of the above techniques, the selection among the reference pictures has been a heuristic approach leading only to small coding gains.