Today there are various video coding standards in existence. These include International Telecommunications Union Telecommunications Standardization Sector (ITU-T) recommendation H.263, and International Standards Organization (ISO) Motion Pictures Expert Group (MPEG) standards MPEG-1, MPEG-2 and MPEG-4. These video coding standards are based on the use of motion compensated prediction and prediction error coding. Motion compensated prediction is performed by analyzing and coding motion between successive frames in a video sequence and reconstructing image blocks using the motion information. The reconstruction of the image blocks is built utilizing motion interpolation filters that are able to generate image (pixel) values for the pixel and sub-pixel positions needed. The basic principles of motion compensated prediction and image reconstruction using interpolation filters is described in greater detail in the following paragraphs.
Digital video sequences, like ordinary motion pictures recorded on film, comprise a sequence of still images, often referred to as “frames”. The illusion of motion is created by displaying the frames one after the other at a relatively fast rate, typically 15 to 30 frames per second. Because of the relatively fast frame rate, the image content of consecutive frames tends to be quite similar, and thus consecutive frames contain a considerable amount of redundant information.
Each frame of a digital video sequence comprises an array of image pixels. In a commonly used digital video format, known as the Quarter Common Interchange Format (QCIF), a frame comprises an array of 176×144 pixels, and thus each frame has 25,344 pixels. Each pixel of the frame is represented by a certain number of bits, which carry information about the luminance and/or color content (chrominance) of the region of the image corresponding to the pixel. Commonly, a so-called YUV color model is used to represent the luminance and chrominance content of an image. The luminance, or Y, component represents the intensity (brightness) of the image, while the color content of the image is represented by two chrominance components, labelled U and V.
Color models based on a luminance/chrominance representation of image content provide certain advantages compared with colour models that are based on a representation involving primary colours (that is Red, Green and Blue, RGB). Because the human visual system is more sensitive to intensity variations than it is to color variations, YUV color models exploit this property by using a lower spatial resolution for the chrominance components (U, V) than for the luminance component (Y). In this way, the amount of information needed to code the colour information in an image can be reduced with minor reduction in image quality.
The lower spatial resolution of the chrominance components is usually attained by spatial sub-sampling. Typically, a block of 16×16 image pixels is coded by one block of 16×16 values representing luminance information, and the two chrominance components are each represented by one block of 8×8 values representing an area of the image equivalent to that of the 16×16 array of luminance values. The chrominance components are thus spatially sub-sampled by a factor of 2 in the horizontal and vertical directions. The resulting assembly of one 16×16 luminance block and two 8×8 chrominance blocks is commonly referred to as a YUV macroblock, or macroblock, for short.
A QCIF image comprises 11×9 macroblocks. If the luminance blocks and chrominance blocks are represented with 8 bit resolution (that is by numbers in the range 0 to 255), the total number of bits required per macroblock is (16×16×8)+2×(8×8×8)=3072 bits. Thus, the number of bits needed to represent a video frame in QCIF format, using 8 bit number resolution per component, is 99×3072=304,128 bits. Therefore, the amount of data required to transmit, record or display a video sequence comprising a series of such QCIF format frames at a rate of 30 frames per second is more than 9 Mbps (million bits per second). This data rate is impractical for use in video recording, transmission and display applications because of the very large storage capacity, transmission channel capacity and hardware performance required. For this reason video coding standards, such as those mentioned above, have been developed in order to reduce the amount of information required to represent and transmit video data while retaining an acceptable image quality.
Each of the previously mentioned video coding standards is tailored for application in video recording or transmission systems having different characteristics. For example, the ISO MPEG-1 standard is designed specifically for use in situations where the available data bandwidth is up to about 1.5 Mbits/s. The MPEG-2 video coding standard is primarily applicable to digital storage media and video broadcast and communication with available data bandwidths of up to about 10 Mbits/s. ITU-T recommendation H.263 is intended for use in systems where the available bandwidth is generally much lower. It is particularly suitable for use in situations where video data is to be transmitted in real-time over a fixed line network such as an ISDN (Integrated Services Digital Network) or a conventional PSTN (Public Service Telephone Network), where the available data transmission bandwidth is typically in the order of 64 kbits/s. In mobile videotelephony, where transmission takes place at least in part over a radio communications link, the available bandwidth can be as low as 20 kbits/s.
Although the various video coding standards currently in existence are tailored for use in different situations, the mechanisms they employ in order to reduce the amount of information to be transmitted have many features in common. In particular, they all work in such a way as to reduce the amount of redundant and perceptually irrelevant information in a video sequence to be transmitted. There are basically three types of redundancy in video sequences: spatial, temporal and spectral redundancy. Spatial redundancy is the term used to describe correlation between neighboring pixels within an individual frame of a sequence. Temporal redundancy expresses the fact that the objects appearing in one frame of a sequence are likely to appear in subsequent frames. Spectral redundancy refers to the correlation between different colour components of the same image.
Sufficiently efficient compression cannot usually be achieved by simply reducing the various forms of redundancy in a given sequence of images. Thus, most current video encoders also reduce the quality of those parts of the video sequence which are subjectively the least important. In addition, the redundancy of the compressed video bit-stream is itself reduced by means of efficient loss-less encoding. Typically, this is achieved using entropy coding.
Motion compensated prediction is a form of temporal redundancy reduction in which the content of some (often many) frames in a video sequence is “predicted” from other frames in the sequence by tracing the motion of objects or regions of an image between frames. Frames that are compressed using motion compensated prediction are typically referred to as INTER-coded or P-frames, whereas frames that are compressed without using motion compensated prediction are called INTRA-coded or I-frames. A predicted (motion-compensated, INTER-coded) image is rarely precise enough to represent the image content with sufficient quality, and therefore a spatially compressed prediction error (PE) frame is also associated with each INTER frame. Many video compression schemes can also make use of bi-directionally predicted frames, which are commonly referred to as B-pictures or B-frames. B-pictures are inserted between reference or so-called “anchor” picture pairs (I or P frames) and are predicted from either one or both of the anchor pictures.
The different types of frame that occur in a typical compressed video sequence are illustrated in FIG. 3 of the accompanying drawings. As can be seen from the figure, the sequence starts with an INTRA or I frame 30. In FIG. 3, arrows 33 denote the “forward” prediction process by which P-frames 34 are formed. The bi-directional prediction process by which B-frames 36 are formed is denoted by arrows 31a and 31b, respectively.
A schematic diagram of a generic video coding system using motion compensated prediction is shown in FIGS. 1 and 2. FIG. 1 illustrates an encoder 10 employing motion compensated prediction and FIG. 2 illustrates a corresponding decoder 20. The encoder 10 shown in FIG. 1 comprises a Motion Field Estimation block 11, a Motion Field Coding block 12, a Motion Compensated Prediction block 13, a Prediction Error Coding block 14, a Prediction Error Decoding block 15, a Multiplexing block 16, a Frame Memory 17, and an adder 19. The decoder 20 comprises a Motion Compensated Prediction block 21, a Prediction Error Decoding block 22, a Demultiplexing block 23 and a Frame Memory 24.
The operating principle of video coders employing motion compensated prediction is to minimize the amount of information in a prediction error frame En(x,y), which is the difference between a current frame In(x,y) being coded and a prediction frame Pn(x,y). The prediction error frame is thus defined as follows:En(x,y)=In(x,y)−Pn(x,y).  (1)The prediction frame Pn(x,y) is built using pixel values of a reference frame Rn(x,y), which is generally one of the previously coded and transmitted frames, for example, the frame immediately preceding the current frame, and is available from the Frame Memory 17 of the encoder 10. More specifically, the prediction frame Pn(x,y) is constructed by finding “prediction pixels” in the reference frame Rn(x,y) which correspond substantially with pixels in the current frame. Motion information, describing the relationship (e.g. relative location, rotation, scale etc.) between pixels in the current frame and their corresponding prediction pixels in the reference frame is derived and the prediction frame is constructed by moving the prediction pixels according to the motion information. In this way, the prediction frame is constructed as an approximate representation of the current frame, using pixel values in the reference frame. The prediction error frame referred to above therefore represents the difference between the approximate representation of the current frame provided by the prediction frame and the current frame itself. The basic advantage provided by video encoders that use motion compensated prediction arises from the fact that a comparatively compact description of the current frame can be obtained by the motion information required to form its prediction, together with the associated prediction error information in the prediction error frame.
Due to the large number of pixels in a frame, it is generally not efficient to transmit separate motion information for each pixel to the decoder. Instead, in most video coding schemes, the current frame is divided into larger image segments Sk, and motion information relating to the segments is transmitted to the decoder. For example, motion information is typically provided for each macroblock of a frame and the same motion information is then used for all pixels within the macroblock. In some video coding standards, such as ITU-T recommendation H.26L, currently under development, a macroblock can be divided into smaller blocks, each smaller block being provided with its own motion information.
The motion information usually takes the form of motion vectors [Δx(x,y),Δy(x,y)]. The pair of numbers Δx(x,y) and Δy(x,y) represents the horizontal and vertical displacements of a pixel (x,y) in the current frame In(x,y) with respect to a pixel in the reference frame Rn(x,y). The motion vectors [Δx(x,y),Δy(x,y)] are calculated in the Motion Field Estimation block 11 and the set of motion vectors of the current frame [Δx(·),Δy(·)] is referred to as the motion vector field.
Typically, the location of a macroblock in a current video frame is specified by the (x,y) co-ordinate of its upper left-hand corner. Thus, in a video coding scheme in which motion information is associated with each macroblock of a frame, each motion vector describes the horizontal and vertical displacement Δx(x,y) and Δy(x,y) of a pixel representing the upper left-hand corner of a macroblock in the current frame In(x,y) with respect to a pixel in the upper left-hand corner of a substantially corresponding block of prediction pixels in the reference frame Rn(x,y) (as shown in FIG. 4b).
Motion estimation is a computationally intensive task. Given a reference frame Rn(x,y) and, for example, a square macroblock comprising N×N pixels in a current frame (as shown in FIG. 4a), the objective of motion estimation is to find an N×N pixel block in the reference frame that matches the characteristics of the macroblock in the current picture according to some criterion. This criterion can be, for example, a sum of absolute differences (SAD) between the pixels of the macroblock in the current frame and the block of pixels in the reference frame with which it is compared. This process is known generally as “block matching”. It should be noted that, in general, the geometry of the block to be matched and that in the reference frame do not have to be the same, as real-world objects can undergo scale changes, as well as rotation and warping. However, in current international video coding standards, such as those referred to above, only a translational motion model is used (see below) and thus fixed rectangular geometry is sufficient.
Ideally, in order to achieve the best chance of finding a match, the whole of the reference frame should be searched. However, this is impractical as it imposes too high a computational burden on the video encoder. Instead, the search region is generally restricted to a region [−p,p] around the original location of the macroblock in the current frame, as shown in FIG. 4c. 
In order to reduce further the amount of motion information to be transmitted from the encoder 10 to the decoder 20, the motion vector field is coded in the Motion Field Coding block 12 of the encoder 10, by representing it with a motion model. In this process, the motion vectors of image segments are re-expressed using certain predetermined functions or, in other words, the motion vector field is represented with a model. Almost all currently used motion vector field models are additive motion models, complying with the following general formula:
                              Δ          ⁢                                          ⁢                      x            ⁡                          (                              x                ,                                                                  ⁢                y                            )                                      =                              ∑                          i              =              0                                      N              -              1                                ⁢                                          ⁢                                    a              i                        ⁢                                          f                i                            ⁡                              (                                  x                  ,                                                                          ⁢                  y                                )                                                                        (        2        )                                          Δ          ⁢                                          ⁢                      y            ⁡                          (                              x                ,                                                                  ⁢                y                            )                                      =                              ∑                          i              =              0                                      M              -              1                                ⁢                                          ⁢                                    b              i                        ⁢                                          g                i                            ⁡                              (                                  x                  ,                                                                          ⁢                  y                                )                                                                        (        3        )            where ai and bi are motion coefficients. The motion coefficients are transmitted to the decoder 20 (information stream 2 in FIGS. 1 and 2). Functions fi and gi are motion field basis functions. They are known both to the encoder and decoder. An approximate motion vector field ({tilde over (Δ)}x(x,y), {tilde over (Δ)}y(x,y)) can be constructed using the coefficients and the basis functions. As the basis functions are known to (that is, stored in) both the encoder 10 and the decoder 20, only the motion coefficients need to be transmitted to the encoder, thus reducing the amount of information required to represent the motion information of the frame.
The simplest motion model is the translational motion model which requires only two coefficients to describe the motion vectors of each segment. The values of motion vectors are given by:Δx(x,y)=a0 Δy(x,y)=b0   (4)This is the model used in ITU-T recommendation H.263 and ISO standards MPEG-1, MPEG-2, MPEG-4 to describe the motion of 16×16 and 8×8 pixel blocks. Systems which use a translational motion model typically perform motion estimation at full pixel resolution or some integer fraction of full pixel resolution, for example at half or one quarter pixel resolution.
The prediction frame Pn(x,y) is constructed in the Motion Compensated Prediction block 13 of the encoder 10, and is given by:Pn(x,y)=Rn[x+{tilde over (Δ)}x(x,y), y+{tilde over (Δ)}y(x,y)]  (5)In the Prediction Error Coding block 14, the prediction error frame En(x,y) is typically compressed by representing it as a finite series (transform) of some 2-dimensional functions. For example, a 2-dimensional Discrete Cosine Transform (DCT) can be used. The transform coefficients are quantized and entropy (for example Huffman) coded before they are transmitted to the decoder (information stream 1 in FIGS. 1 and 2). Because of the error introduced by quantization, this operation usually produces some degradation (loss of information) in the prediction error frame En(x,y). To compensate for this degradation, the encoder 10 also comprises a Prediction Error Decoding block 15, where a decoded prediction error frame {tilde over (E)}n(x,y) is constructed using the transform coefficients. This locally decoded prediction error frame is added to the prediction frame Pn(x,y) by adder 19 and the resulting decoded current frame Ĩn(x,y) is stored in the Frame Memory 17 for further use as the next reference frame Rn+1(x,y).
The information stream 2 carrying information about the motion vectors is combined with information about the prediction error in multiplexer 16 and an information stream 3 containing typically at least those two types of information is sent to the decoder 20.
The operation of a corresponding video decoder 20 will now be described.
The Frame Memory 24 of the decoder 20 stores a previously reconstructed reference frame Rn(x,y). The prediction frame Pn(x,y) is constructed in the Motion Compensated Prediction block 21 of the decoder 20 according to Equation 5, using received motion coefficient information and pixel values of the previously reconstructed reference frame Rn(x,y). The transmitted transform coefficients of the prediction error frame En(x,y) are used in the Prediction Error Decoding block 22 to construct the decoded prediction error frame {tilde over (E)}n(x,y). The pixels of the decoded current frame Ĩn(x,y) are then reconstructed by adding the prediction frame Pn(x,y) and the decoded prediction error frame {tilde over (E)}n(x,y):Ĩn(x,y)=Pn(x,y)+{tilde over (E)}n(x,y)=Rn[x+{tilde over (Δ)}x(x,y), y+{tilde over (Δ)}y(x,y)]+{tilde over (E)}n(n,y).  (6)This decoded current frame may be stored in the Frame Memory 24 as the next reference frame Rn+1(x,y).
In the description of motion compensated encoding and decoding of digital video presented above, the motion vector [Δx(x,y),Δy(x,y)] describing the motion of a macroblock in the current frame with respect to the reference frame Rn(x,y) can point to any of the pixels in the reference frame. This means that motion between frames of a digital video sequence can only be represented at a resolution determined by the image pixels in the frame (so-called full pixel resolution). Real motion, however, has arbitrary precision, and thus the system described above can only provide approximate modelling of the motion between successive frames of a digital video sequence. Typically, modelling of motion between video frames with full pixel resolution is not sufficiently accurate to allow efficient minimization of the prediction error (PE) information associated with each macroblock or frame. Therefore, to enable more accurate modelling of real motion and to help reduce the amount of PE information that must be transmitted from encoder to decoder, many video coding standards allow motion vectors to point “in between” image pixels. In other words, the motion vectors can have “sub-pixel” resolution. Allowing motion vectors to have sub-pixel resolution adds to the complexity of the encoding and decoding operations that must be performed, so it is still advantageous to limit the degree of spatial resolution a motion vector may have. Thus, video coding standards, such as those previously mentioned, typically only allow motion vectors to have full-, half- or quarter-pixel resolution.
Motion estimation with sub-pixel resolution can be implemented as a two-stage process, as illustrated in an exemplary fashion in FIG. 5, for a generic video coding scheme in which motion vectors may have full- or half-pixel resolution. In the first step, a motion vector having full-pixel resolution is determined using an appropriate motion estimation scheme, such as the block-matching process described above. The resulting motion vector, having full-pixel resolution is shown in FIG. 5.
In the second stage, the motion vector determined in the first stage is refined to obtain the desired half-pixel resolution. In the example illustrated in FIG. 5, this is done by forming eight new search blocks of 16×16 pixels, the location of the top-left corner of each block being marked with an X in FIG. 5. These locations are denoted as [Δx+m/2, Δy+n/2], where m and n can take the values −1, 0 and +1, but cannot be zero at the same time. As only the pixel values of original image pixels are known, the values (for example, luminance and/or chrominance values) of the sub-pixels residing at half-pixel locations are estimated for each of the eight new search blocks, using some form of interpolation scheme.
Having interpolated the values of the sub-pixels at half-pixel resolution, each of the eight search blocks is compared with the macroblock whose motion vector is being sought. As in the block matching process performed in order to determine the motion vector with full pixel resolution, the macroblock is compared with each of the eight search blocks according to some criterion, for example a SAD. As a result of the comparisons, a minimum SAD value will generally be obtained. Depending on the nature of the motion in the video sequence, this minimum value may correspond to the location specified by the original motion vector (having full-pixel resolution), or it may correspond to a location having a half-pixel resolution. Thus, it is possible to determine whether a motion vector should point to a full-pixel or sub-pixel location, and if sub-pixel resolution is appropriate, to determine the correct sub-pixel resolution motion vector.
In practice, the estimation of a sub-pixel value in the reference frame is performed by interpolating the value of the sub-pixel from surrounding pixel values. In general, interpolation of a sub-pixel value F(x,y) situated at a non-integer location (x,y)=(n+Δx, m+Δy), can be formulated as a two-dimensional operation, represented mathematically as:
                                          F            ⁡                          (                              x                ,                                                                  ⁢                y                            )                                =                                    ∑                              k                =                                  -                  K                                                            K                =                1                                      ⁢                                                  ⁢                                          ∑                                  l                  =                                      -                    L                                                                    L                  =                  1                                            ⁢                                                f                  ⁡                                      (                                          k                      +                                              K                        ,                                                                                                  ⁢                        l                                            +                      L                                        )                                                  ⁢                                  F                  ⁡                                      (                                          n                      +                                              k                        ,                                                                                                  ⁢                        m                                            +                      l                                        )                                                                                      ⁢                                                      (        7        )            where f(k,l) are filter coefficients and n and m are obtained by truncating x and y, respectively, to integer values. Typically, the filter coefficients are dependent on the x and y values and the interpolation filters are usually so-called “separable filters”, in which case sub-pixel value F(x,y) can be calculated as follows:
                              F          ⁡                      (                          x              ,                                                          ⁢              y                        )                          =                              ∑                          k              =                              -                K                                                    K              -              1                                ⁢                                          ⁢                                    f              ⁡                              (                                  k                  +                  K                                )                                      ⁢                                          ∑                                  l                  =                                      -                    K                                                                    K                  -                  1                                            ⁢                                                          ⁢                                                f                  ⁡                                      (                                          l                      +                      K                                        )                                                  ⁢                                  F                  ⁡                                      (                                          n                      +                                              k                        ,                                                                                                  ⁢                        m                                            +                      l                                        )                                                                                                          (        8        )            The motion vectors are calculated in the encoder. Once the corresponding motion coefficients are transmitted to the decoder, it is a straightforward matter to interpolate the required sub-pixels using an interpolation method identical to that used in the encoder. In this way, a frame following a reference frame in the Frame Memory 24, can be reconstructed from the reference frame and the transmitted motion vectors.
Conventionally, the interpolation filters used in video encoders and decoders employ fixed filter coefficient values and the same filter (i.e., the same type of filter with the same filter coefficient values) is used for all frames of a video sequence being coded. The same filter is further used for all video sequences, irrespective of their nature and how they were acquired (captured). Wedi (“Adaptive Interpolation Filter for Motion Compensated Hybrid Video Coding,” Picture Coding Symposium (PCS 2001), Seoul, Korea, April 2001), proposes the use of interpolation filters with adaptive filter coefficient values, in order to compensate for certain shortcomings in the video coding process. In particular, Wedi describes how aliasing in the image acquisition process, the finite resolution of allowed motion vectors and the limited validity of the translational motion model introduce additional prediction errors. Aliasing in a video image arises due to the use of non-ideal low-pass filters (and consequent non-fulfilment of the Nyquist Sampling Theorem) in the image acquisition process. Aliasing disturbs motion compensated prediction within the video sequence and gives rise to an additional prediction error component. The finite precision of the allowed motion vectors (e.g., full-pixel, one-half pixel, or one-quarter pixel) and the ability of the translational motion model to represent only horizontal and vertical translational movement between successive video frames also give rise to additional prediction error contributions. Wedi further proposes that an improvement in coding efficiency can be achieved by adapting the filter coefficient values of an interpolation filter to compensate for the additional prediction errors introduced by aliasing, finite motion vector precision and limited validity of the translational motion model.
More generally, it should be appreciated that since the nature and characteristics of the motion varies in a video sequence, the optimal interpolation filter varies as a function of time and image location. Wedi presents an example in which an interpolation filter with dynamically adaptive filter coefficient values is integrated into the H.26L video codec, more specifically, the version of that codec defined by Test Model (TML) 4. TML-4 of H.26L used a one-quarter-pixel motion vector resolution and a Wiener-type interpolation filter with six symmetric filter coefficients (6-tap filter). The example presented in Wedi proposes adapting the filter coefficients of the interpolation filter on a frame-by-frame basis, differentially coding the filter coefficients and transmitting them to the decoder as side information to the main video data. A proposal based on this approach was made to include the use of interpolation filters with dynamically adaptive filter coefficient values in Test Model 8 of the H.26L video codec. This is presented in the ITU—Telecommunications Standardization Sector entitled: “Adaptive Interpolation Filter for H.26L” Study Group 16, Question 6, Video Coding Experts Group (VCEG), document VCEG-N28 September 2001 and “More Results on Adaptive Interpolation Filter for H.26L” Study Group 16, Question 6, Video Coding Experts Group (VCEG), document VCEG-016r1, November 2001.
The use of dynamically adaptive interpolation filters raises an important issue relating to the coding efficiency of the encoded video data stream and also has an effect on the error resilience of the encoded video data. The issue of coding efficiency can be understood in a straightforward manner. In a video coding system that employs an interpolation filter having fixed filter coefficient values, there is no need to include any information relating to the filter coefficient values in the encoded video data bit-stream. The filter coefficient values can simply be recorded in the video encoder and video decoder. In other words, in a video coding system implemented according to a particular video coding standard that employs fixed interpolation filters, the coefficient values are pre-programmed into both encoder and decoder according to the specifications of the standard. However, if dynamically adaptive filter coefficients are allowed, it becomes necessary to transmit information relating to the coefficient values. As the filter coefficients are periodically updated (e.g. on a frame-by-frame basis), this necessarily adds to the amount of information to be sent from the video encoder to the decoder and has a deleterious effect on coding efficiency. In low bit-rate video coding applications, any increase in the amount of information to be transmitted is generally undesirable.
Thus, in order to optimally model and compensate motion, an efficient representation of the dynamic interpolation filters is needed.
Regarding error resilience, it should be appreciated that the way in which information about the coefficients of a dynamically variable interpolation filter is transmitted from encoder to decoder may affect the susceptibility of the video data to transmission errors. More specifically, in a video coding system that employs dynamically adaptive interpolation filters, correct reconstruction of a frame of a video sequence at the decoder is reliant on correct reception and decoding of the filter coefficient values. If the information relating to the coefficient values is subject to error during its transmission from encoder to decoder, corruption of the reconstructed video data is likely. There are three ways of coding the filter coefficients known from prior art. The first is to entropy code the filter coefficient values separately. The second is to entropy code the filter coefficient values differentially with respect to filter coefficients of already decoded filters (as proposed in Wedi) and the third is to define a set of filters and code the index of the selected filter.
The prior art solutions that could be used for coding interpolation filter coefficients, as mentioned above, all have problems associated with them in different usage scenarios. The first method, in which the interpolation filter coefficients are coded separately offers inferior coding performance, since it does not utilise any a priori information (i.e., information about previously coded interpolation filter coefficient values). This approach therefore requires an unduly large amount of information to be added to the encoded video bit-stream in order to describe the interpolation filter coefficient values. Differential coding of the coefficients, as proposed in Wedi, is efficient, but may not be used in an environment with possible transmission errors, since the filter coefficients depend on correct decoding of earlier filter coefficients. As previously described, if the encoded video bit-stream is subject to error during its transmission from encoder to decoder, corruption of the video data reconstructed at the decoder is likely to occur. The third prior art solution with a predefined set of filters provides only limited alternatives and thus degrades the coding performance. In other words, this option cannot achieve the full advantages of using interpolation filters with dynamically adaptive filter coefficient values, as set out in Wedi.
Thus, it should be appreciated that there is a need for a method of coding the coefficient values of adaptive interpolation filters that is both efficient and does not lead to deterioration in the error resilience of the encoded video bit-stream.