This invention relates to compression and transmission of video signals and, more particularly, to encoding and decoding temporal redundant information present in video signals.
Video signals can be digitized, encoded, and subsequently decoded in a manner which significantly decreases the number of bits necessary to represent a decoded reconstructed video without noticeable, or with acceptable, degradation in the reconstructed video. Video coding is an important part of many applications such as digital television transmission, video conferencing, video database, etc.
In video conferencing applications, for example, a video camera is typically used to capture a series of images of a target, such as a meeting participant or a document. The series of images is encoded as a data stream and transmitted over a communications channel to a remote location. For example, the data stream may be transmitted over a phone line, an integrated services digital network (ISDN) line, or the Internet.
The encoding process is typically implemented using a digital video coder/decoder (codec), which divides the images into blocks and compresses the blocks according to a video compression standard, such as the ITU-T H.263 and H.261 standards. In standards of this type, a block may be compressed independent of the previous image or as a difference between the block and part of the previous image. In a typical video conferencing system, the data stream is received at a remote location, where it is decoded into a series of images, which may be viewed at the remote location. Depending on the equipment used, this process typically occurs at a rate of one to thirty frames per second.
One technique widely used in video systems is hybrid video coding. An efficient hybrid video coding system is based on the ITU-T Recommendation H.263. The ITU-T Recommendation H.263 adopts a hybrid scheme of motion-compensated prediction to exploit temporal redundancy and transform coding using the discrete cosine transform (DCT) of the remaining signal to reduce spatial redundancy. Half pixel precision is used for the motion compensation, and variable length coding is used for the symbol representation.
With the above-mentioned coding algorithm, fifteen negotiable coding options can be used, either together or separately. The motion compensation (MC) is often carried out by employing the immediately preceding image which is available as a reconstructed image at encoder and decoder. While long-term statistical dependencies in the coded video sequence have not been fully exploited in existing international standards for improving coding efficiency, a specified negotiable coding option called xe2x80x9cReference Picture Selection Modexe2x80x9d (RPS mode) permits a modified inter-picture prediction called xe2x80x9cNEWPREDxe2x80x9d. This prediction is intended to stop temporal error propagation due to errors. Transmission errors cause different results of the decoding process at encoder and decoder, thereby leading to differently and inconsistently reconstructed frames. The RPS mode can use backward channel messages sent from a decoder to an encoder to inform the encoder which part of which pictures have been correctly decoded at the decoder. The encoder may select one of the picture memories to suppress the temporal error propagation due to the inter-frame coding. A particular picture memory is selected as reference for inter-frame coding of a complete picture, which is represented as a xe2x80x9cgroup of blocksxe2x80x9d or a xe2x80x9cslicexe2x80x9d as specified in the H.263 document. The amount of additional picture memory accommodated in the decoder may be signaled by external means as specified in the ITU-T Recommendations.
The RPS mode is designed to suppress the temporal error propagation due to the inter-frame coding which occurs in case of transmission errors. Techniques that use multiple reference picture in order to achieve the additional goal of improving coding efficiency are being analyzed within the MPEG-4 standardization group. These techniques include schemes known as xe2x80x9cSprites,xe2x80x9d xe2x80x9cGlobal Motion Compensation,xe2x80x9d xe2x80x9cShort-Term Frame Memory/Long-Term Frame Memoryxe2x80x9d and xe2x80x9cBackground Memoryxe2x80x9d prediction. A commonality of these techniques is that the video encoder can choose between the immediately preceding reconstructed picture and a second picture, either being generated by the prediction technique. While the use of more than a second picture has been exploited when combining various ones of the above techniques, the selection among the reference pictures has been a heuristic approach leading only to small coding gains.
Generally, motion-compensated coding schemes achieve data compression by exploiting the similarities between successive frames of a video signal. Often, with such schemes, motion-compensated prediction (MCP) is combined with intraframe encoding of the prediction error. Successful applications range from digital video broadcasting to low rate videophones. Several standards, such as ITU-T H.263, are based on this scheme.
Many codecs today employ more than one motion-compensated prediction signal simultaneously to predict the current frame. The term xe2x80x9cmulti-hypothesis motion compensationxe2x80x9d has been generally used to refer to this approach. A linear combination of multiple prediction hypotheses is formed to arrive at the actual prediction signal. Examples are the combination of past and future frames to predict B-frames or overlapped block motion compensation in the MPEG or H.263 coding schemes. Multi-hypothesis motion-compensated prediction extends traditional motion-compensated prediction used in video coding schemes. Known algorithms for block-based multi-hypothesis motion-compensated prediction are, for example, overlapped block motion compensation (OBMC) and bi-directionally predicted frames (B-frames). While there have been some advances made using multi-hypothesis motion compensation, the need to further increase coding gains continues.
According to various aspects of the present invention, embodiments thereof are exemplified in the form of motion-prediction methods and arrangements in connection with encoding, decoding and performing video conferencing. One specific implementation is directed to a method for predicting an image segment using up to N reference pictures (or frames) of the video data in a compression/decompression communication arrangement, where N is a positive integer greater than one and each reference picture can be represented as a plurality of image segments. The method comprises: selecting at least two spatially-displaced image segments from previously decoded reference pictures corresponding to time instances on one side of a temporal axis; and forming a predictor signal by combining the selected spatially-displaced image segments.
More particular aspects of the present invention are directed to implementations using one or more of the following: forming a Lagrangian cost function for selecting the prediction code; obtaining the Lagrangian cost function as a weighted sum of a distortion measure and a rate measure; obtaining the distortion measure as a function of the image segment to be predicted and the multi-hypothesis prediction signal; obtaining the rate measure as a function of the codes for transmission of the multi-hypothesis predictor; transmitting the multi-hypothesis prediction codes to the decoder; forming a prediction signal by combining the image segments that are addressed by the multi-hypothesis prediction code.
Another more particular embodiment of the present invention is directed to searching iteratively in order to select the spatially-displaced image segments and corresponding delays in an effort to minimize the Lagrangian cost function. The iterative searching, according to another particular example embodiment of the present invention, includes: (a) fixing all but one image segment and changing the one image segment and its corresponding delay parameter to reduce the Lagrangian cost function; (b) using the changed image segment and its corresponding delay parameter, repeating the step of fixing and changing for each remaining image segment; and (c) repeating step (a) and (b).
Among other aspects of the present invention, example embodiments are directed to various arrangement implementations relating to the above method.
The above summary is not intended to provide an overview of all aspects of the present invention. Other aspects of the present invention are exemplified and described in connection with the detailed description.