The invention relates to a method of encoding a source sequence of pictures comprising the steps of:
dividing a source sequence into a set of group of pictures, each group of pictures comprising a first frame, hereafter referred to as I-frame, followed by at least a pair of frames, hereafter referred to as PB-frames;
dividing each I-frame and PB-frame into spatially non-overlapping blocks of pixels;
encoding the blocks from said I-frame, hereafter referred to as I-blocks, independently from any other frame in the group of pictures;
deriving motion vectors and corresponding predictors for the blocks from the temporally second frame of said PB-frame, hereafter referred to as the P-blocks, based on the I-blocks in the previous I-frame or the P-blocks in the previous PB-frame;
deriving for each block from the first frame of said PB-frame, hereafter referred to as a B-block, a forward motion vector from said motion vector of the P-block with same location, allowing to obtain for each B-block an associated I-block in the previous I-frame or an associated P-block in the previous PB-frame, hereafter referred to as If-block or Pf-block, respectively;
deriving for each B-block of the first frame of said PB-frame, a backward motion vector from said motion vector of the P-block with same location, allowing to obtain for each B-block an associated P-block in the P-frame of said PB-frame, hereafter referred to as the Pb-block;
choosing a prediction mode for the encoding of each B-block;
predictively encoding the P-blocks of the second frame of said PB-frame based on the I-blocks in the previous I-frame or the P-blocks in the previous PB-frame;
predictively encoding the B-blocks following the chosen prediction mode.
The invention also relates to a system for carrying out said method.
The invention may be used, for example, in video coding at very low bit rate.
Standardization of low bitrate video telephony products and technology by the ITU (International Telecommunication Union) are compiled in the standards H.320 and H.324. These standards describe all the requirements to be satisfied for the different components audio, video, multiplexer, control protocol and modem. H.320 is dedicated to videoconferencing or videophony over ISDN (Integrated Services Data Network) phone lines. H.324 is aimed at videophony over GSTN (Global Switch Telephonic Network) analog phone lines. The two standards both support for video-coding the Recommendation H.263, which describes compression of low bitrate video signals. The H.263 Recommendation comprises four optional modes for a video coder. One of these optional modes is called the PB-frames mode, which gives a way of encoding a PB-frame. A second version of the H.263 Recommendation, called H.263+, was developed to improve the image quality and comprises some new options. Thus, an option called Improved PB-frames mode, which is an improvement of the original PB-frames mode, provides a new way of encoding a PB-frame.
A sequence of picture frames may be composed of a series of I-frames and PB-frames. A I-frame consists in a picture coded according to an Intra mode, which means that a I-frame is coded using spatial redundancy within the picture without any reference to another picture. A P-frame is predictively encoded from a previous P or I-picture. Thus, when coding a P-picture, temporal redundancy between the P-picture and a previous picture used as a picture reference, which is mostly the previous I or P-picture, is used in addition to the spatial redundancy as for a I-picture. A B-picture has two temporal references and is usually predictively encoded from a previous P or I-picture and the P-picture currently being reconstructed. A PB-frame consists of two successive pictures, a first B-frame and a subsequent P-frame, coded as one unit.
A method of coding a PB-frame in accordance to the PB-frames mode is illustrated in FIG. 1. It is shown a PB-frame composed of a B-frame B and a P-frame P2. The B-frame B is surrounded by a previous P-picture P1 and the P-picture P2 currently being reconstructed. It is shown in this example a P-picture P1, P1 may also be a I-picture and serves as a picture reference for the encoding of the P-picture P2 and the B-picture B. A B-block from the B-frame, in the PB-frames mode, can be forward or bidirectionally predictively encoded. The encoding of a B-block being forward predicted is based on the previous I or P-picture P1 and the encoding of a B-block being bidirectionally predicted is based on both the previous I or P-picture P1 and the P-picture P2 currently being reconstructed. A set of motion vectors MV is derived for the P-picture P2 of the PB-frame with reference to the picture P1. In fact for each macroblock of P2, a macroblock of P1 is associated by block matching and a corresponding motion vector MV is derived. Motion vectors for the B-block are derived from the set of motion vectors derived previously for P1. Therefore a forward motion vector MVf and a backward motion vector MVb are calculated for a B-block as follows:
MVf=(TRbxc3x97MV)/TRd
MVb=((TRbxe2x88x92TRd)xc3x97MV)/TRdxe2x80x83xe2x80x83(1)
MVb=MVfxe2x88x92MV
where
TRb is the increment in the temporal reference of the B-picture from the previous P-frame P1, and
TRd is the increment in the temporal reference of the current P-frame P2 from the previous I or P-picture P1.
It is considered on FIG. 1 a macroblock AB of the B-picture. This macroblock AB has the same location as a macroblock A2B2, Prec, of P2 that was previously reconstructed. A forward motion vector MV is associated to the macroblock A2B2 from a macroblock A1B1, which belongs to P1. A forward motion vector MVf and a backward motion vector MVb, both associated with AB, are derived from MV as shown in (1). The macroblocks of P1 and P2 associated with the AB macroblock by the forward vector MVf and by the backward vector MVb are respectively K1M1 and K2M2 as illustrated on FIG. 1.
The choice between bidirectional prediction and forward prediction is made at the block level in the B-picture and it depends on where MVb points. Then a MB part of the B-block AB, for which MVb points inside Prec, is bidirectionally predicted and the prediction for this part of the B-block is:
MB(ij)=[A1M1(ij)+A2M2(ij)]/2 where i and j are the spatial coordinates of the pixels.
A AM part of the B-block AB, for which MVb points outside Prec, is forward predicted and the prediction for this part of the B-block AB is:
AM(i,j)=K1A1 (i,j).
An improved method of encoding a PB-frame according to the PB-frames mode is described in the European Patent Application EP 0 782 343 A2. It discloses a predictive method of coding the blocks in the bidirectionally predicted frame, which method introduces a delta motion vector added to or subtracted from the derived forward and backward motion vectors respectively. The described method may be relevant when the motion in a sequence of pictures is non-linear, however it is totally unsuitable for a sequence of pictures where scene-cuts occur. Indeed, when there is a scene cut between a previous P-frame and the B-part of a PB-frame, bidirectional and forward prediction give an erroneous coding. Besides the implementation of the delta vector, which is costly in terms of CPU burden, may result in unnecessary expensive and complicated calculations.
It is an object of the invention to ameliorate efficiency of existing coding methods, while decreasing CPU burden, and more precisely an object of the invention is to provide an efficient strategy or method which permits to make the most suitable choice among prediction modes for the coding of a given macroblock of a B-frame.
Thus, the choice of the prediction mode for the encoding of each B-block comprises for each B-block in series the steps of:
deriving the sum of absolute difference between the B-block and a block with pixels values being the means of the pixels values of the Pb-block and of the Pf-block or If-block, hereafter referred to as SADbidir;
deriving the sum of absolute difference between the B-block and the P-block in the second frame of the PB-frame with same location as the B-block, hereafter referred to as SADb;
when SADb is greater than SADbidir, making the choice of predictively encoding the B-block based on said P-block with same location as the B-block;
when SADb is lower than SADbidir:
deriving the difference between said motion vector and said predictor of the P-block in the P-frame of said PB-frame with same location as the B-block;
when the obtained difference is lower than a predetermined threshold, making the choice of predictively encoding the B-block based on the P-blocks of the second frame of said PB-frame and the I-blocks or the P-blocks in the previous PB-frame;
when the obtained difference is greater than the predetermined threshold, deriving the minimum of the sum of absolute difference for the B-block based on the I-blocks in the previous I-frame or on the P-blocks in the previous PB-frame, and making the choice of predictively encoding the B-block based on the I-blocks or the P-blocks in the previous PB-frame.
The claimed method gives, for the coding of a B-block, a strategy for the choice of the prediction mode to be used among the forward, backward and bidirectional modes. The choice is based upon SAD (Sum of Absolute Difference) calculation and motion vector coherence. The strategy is based upon a specific order in the comparisons of the SAD values for the three prediction modes and the introduction of motion coherence. Furthermore the choice of the prediction mode for the encoding of a B-frame is made before any P or B-picture is encoded. Thus, because the proposed strategy is performed on original pictures, the SAD calculations, particularly the calculation of SADbidirectional, do not require a prior bidirectional prediction for the B-frame, which is CPU consuming. The proposed method has the main advantage of not being in favor of bidirectional prediction and allows to perform backward prediction when there is no motion. Thus, the method leads to a suitable choice of prediction mode for a given block of a B-frame.
In a preferred embodiment of the invention, a method according to the invention may either be carried out by a system constituted of wired electronic circuits that may perform the various steps of the proposed method. This method may also be partly performed by means of a set of instructions stored in a computer-readable medium.