This invention relates to compression and transmission of video signals and, more particularly, to encoding and decoding temporal redundant information present in video signals.
Video signals can be digitized, encoded, and subsequently decoded in a manner which significantly decreases the number of bits necessary to represent a decoded reconstructed video without noticeable, or with acceptable, degradation in the reconstructed video. Video coding is an important part of many applications such as digital television transmission, video conferencing, video database, etc.
In video conferencing applications, for example, a video camera is typically used to capture a series of images of a target, such as a meeting participant or a document. The series of images is encoded as a data stream and transmitted over a communications channel to a remote location. For example, the data stream may be transmitted over a phone line, an integrated services digital network (ISDN) line, or the Internet.
The encoding process is typically implemented using a digital video coder/decoder (codec), which divides the images into blocks and compresses the blocks according to a video compression standard, such as the ITU-T H.263 and H.261 standards. In standards of this type, a block may be compressed independent of the previous image or as a difference between the block and part of the previous image. In a typical video conferencing system, the data stream is received at a remote location, where it is decoded into a series of images, which may be viewed at the remote location. Depending on the equipment used, this process typically occurs at a rate of one to thirty frames per second.
One technique widely used in video systems is hybrid video coding. An efficient hybrid video coding system is based on the ITU-T Recommendation H.263. The ITU-T Recommendation H.263 adopts a hybrid scheme of motion-compensated prediction to exploit temporal redundancy and transform coding using the discrete cosine transform (DCT) of the remaining signal to reduce spatial redundancy. Half pixel precision is used for the motion compensation, and variable length coding is used for the symbol representation.
With the above-mentioned coding algorithm, fifteen negotiable coding options can be used, either together or separately. The motion compensation (MC) is often carried out by employing the immediately preceding image which is available as a reconstructed image at encoder and decoder. While long-term statistical dependencies in the coded video sequence have not been fully exploited in existing international standards for improving coding efficiency, a specified negotiable coding option called xe2x80x9cReference Picture Selection Modexe2x80x9d (RPS mode) permits a modified inter-picture prediction called xe2x80x9cNEWPREDxe2x80x9d. This prediction is intended to stop temporal error propagation due to errors. Transmission errors cause different results of the decoding process at encoder and decoder, thereby leading to differently and inconsistently reconstructed frames. The RPS mode can use backward channel messages sent from a decoder to an encoder to inform the encoder which part of which pictures have been correctly decoded at the decoder. The encoder may select one of the picture memories to suppress the temporal error propagation due to the inter-frame coding. A particular picture memory is selected as reference for inter-frame coding of a complete picture, which is represented as a xe2x80x9cgroup of blocksxe2x80x9d or a xe2x80x9cslicexe2x80x9d as specified in the H.263 document. The amount of additional picture memory accommodated in the decoder may be signaled by external means as specified in the ITU-T Recommendations.
The RPS mode is designed to suppress the temporal error propagation due to the inner-frame coding which occurs in case of transmission errors. Techniques to use multiple reference pictures in order to achieve the additional goal of improving coding efficiency are being analyzed within the MPEG-4 standardization group. These techniques include schemes known as xe2x80x9cSprites,xe2x80x9d xe2x80x9cGlobal Motion Compensation,xe2x80x9d xe2x80x9cShort-Term Frame Memory/Long-Term Frame Memoryxe2x80x9d and xe2x80x9cBackground Memoryxe2x80x9d prediction. A commonality of these techniques is that the video encoder can choose between the immediately preceding reconstructed picture and a second picture, either being generated by the prediction technique. While the use of more than a second picture has been exploited when combining various ones of the above techniques, the selection among the reference pictures has been a heuristic approach leading only to small coding gains.
According to various aspects of the present invention, embodiments thereof are exemplified in the form of methods and arrangements for encoding, decoding and performing video conferencing. One specific implementation includes a method of coding and decoding video images for transmission from a first station to a second station. The method includes: providing respective sets of multiple reference pictures at the first and second stations to permit use of up to frames N for prediction, where N is a positive integer; and at the first station, encoding the video images and determining motion parameters, including a frame selection parameter, for the video images as a function of at least one of: a distortion criteria and a data transmission rate. The encoded video images and the motion parameters are then transmitted to the second station, and a new video image is predicted for display at the second station as function of the motion parameters and a group of the N frames.
Another aspect of the present invention is directed to an arrangement for coding and decoding video images for transmission from a first station to a second station. The arrangement includes: memories for storing respective sets of multiple reference pictures at the first and second stations to permit use of up to frames N for prediction, where N is a positive integer; an encoder responsive to video images provided at the first station and arranged to generate encoded video images; means for determining motion parameters, including a frame selection parameter and a spatial displacement parameter, for the video images as a function of at least one of: a distortion criteria and a data transmission rate; and means, responsive to the encoded video images and the motion parameters, for predicting a new video image for display at the second station as function of the motion parameters and a group of the N frames.
The above summary is not intended to provide an overview of all aspects of the present invention. Other aspects of the present invention are exemplified and described in connection with the detailed description.