The term “communicating terminal” is understood to mean any equipment able to receive data coding a video and able to decode this data for the purpose of displaying a video. The terminal may be an item of wireless equipment (for example, a mobile telephone, a personal digital assistant, etc.) and/or a wired apparatus that can be connected to a network (such as a computer, a television equipped with a digital decoder, etc.). However, the invention is particularly advantageous in applications in wireless equipment for which the use of a limited bandwidth and low power consumption are key factors.
The term “video” is understood to mean a sequence of moving images, as may be produced by a video source, such as a digital video camera or digital image synthesis software, and then subjected to any kind of processing before being displayed or stored on a data medium. Each image comprises a number of pixels (a pixel is the smallest element of an image) which depends on the resolution of the digital video source. The resolution is thus expressed as the number of pixels (width by height) of the images.
In a typical video application, the information needed to define the pixels of an entire image are coded and transmitted in a data entity called a frame (at least in the case of a progressive display system).
Second-generation telecommunication systems, such as GSM (Global System for Mobile communications) allow wireless transmission of voice-coded data. However, the data processing capabilities of second-generation systems are limited. This is why third-generation systems such as UMTS (Universal Mobile Telecommunication System) will allow high-datarate services to be provided, for which fixed or moving images of high quality will be able to be transmitted, especially for allowing access to the Internet. The UMTS market, thanks to its multimedia functionalities, will offer new opportunities for content and application providers, operators and manufacturers.
In this context, the quality of the videos displayed on the screen of the telecommunication terminals represents a major challenge for manufacturers. In multimedia applications (videoconferencing, video telephony, video clips, etc.), two principal aspects influence the displayed video quality:                on the one hand, the image resolution, that is to say the number of pixels which are coded and transmitted in one frame; and        on the other hand, the frame rate (for a progressive display system), that is to say the number of images coded and transmitted per time unit.        
UMTS should offer a data transfer rate of up to 384 kB/s (kilobytes per second). With such a bandwidth, QCIF (Quarter Common Intermediate Format) images can be transmitted only with a frame rate of 15 ips (images per second).
The QCIF is a format recommended by recommendation H261 of the UIT-T (which is the video component of the UIT-T standard H320) relating to video coding for audiovisual services, such as videoconferencing. The size of a QCIF image is equal to 144×176 pixels (i.e. 144 lines by 176 columns), the recommended frame rate being 30 ips.
Consequently, the frame rate of images output by a digital video source producing QCIF images will be divided by two, on the transmitter side, in order to go from 30 ips to 15 ips. This is easily obtained by removing every other image. This operation corresponds to undersampling of the video.
Of course, the quality of the video displayed decreases when the frame rate decreases. This is illustrated by the diagrams in FIGS. 1A and 1B. These figures represent, side by side, successive views of a display screen on which a video showing the fall of an object (a ball) onto the ground are displayed, with a non-zero angle to the vertical and from left to right with respect to the observer. In each figure, a rectangle to the left of the “=” sign corresponds to the display of an image. The horizontal arrow above the images of FIG. 1A indicates the order in which the images are displayed on the screen. In each figure, the rectangle Res on the right of the “=” sign represents diagrammatically the display of an image resulting from the superposition of the successive images whose display is illustrated to the left of this sign. This resulting image is used to illustrate the visual effect reproduced when displaying the video (taking persistence of vision into account).
FIG. 1A shows, side by side, views of the display screen on which the images of the video at times t, such that t=T0, t=T0+T, t=T0+2T, t=T0+3T, t=to +4T, t=T0+5T, etc. respectively, are displayed. These images are, for example, QCIF images produced by a video source having a frame rate of 30 ips and are displayed at this rate, that is to say T =33.33 ms (milliseconds). Hereafter, these images will be called original images.
FIG. 1B shows the same views of the screen as FIG. 1A when the video undergoes, on the transmitter side, undersampling, causing the frame rate to drop to 15 ips (i.e. the frame rate is reduced by a factor of 2). Thus, as shown, only every other image is displayed, at times T0, T0+2T, T0+4T, etc. It follows that the resulting image Res is affected by a flicker effect.
To alleviate this drawback, it is required to replace the missing original images with other images, which are generated, on the receiver side, in particular from the original images transmitted. Advantageously, the frame rate is increased by a factor of two, generating images that are each intended to be integrated between two consecutive original images in the video to be displayed. This operation corresponds to oversampling of the video transmitted.
Thus, it is known to use a frame repetition algorithm, which allows each image to be displayed twice in succession. In other words, this algorithm allows the same original image to be displayed at times T0+ and T0+T, then the next original image at times T0+2T and T0+3T, then the next original image at times T0+4T and T0+5T, etc. The effect of this algorithm is illustrated by the diagram in FIG. 1C.
Frame repetition is easy to implement but does have, nevertheless, the drawback of introducing a jerky motion effect when displaying the video, as illustrated by the resulting image Res in FIG. 2C. This is why frame repetition is satisfactory only for images that are stationary or moving very little.
It is also known to use an image interpolation algorithm. Such an algorithm generates interpolated images that approximate as far as possible the missing original images, these being displayed at the times T0+T, T0+3T, T0+5T, etc. For this purpose, an interpolation function is used. This function is such that, at the aforementioned times t, there is the following equation for each pixel:In(t)=F(Or(t−T),Or(t+T))  (1)
where In(t) denotes the interpolated image displayed at time t;
Or(t−T) denotes the original image displayed at time t−T;
Or(t+T) denotes the original image displayed at time t+T; and
F denotes the interpolation function.
A conventional example of an interpolation function is the static mean function, denoted Fstat hereafter. This function is such that:
                              Fstat          ⁡                      (                                          Or                ⁡                                  (                                      t                    -                    T                                    )                                            ,                              Or                ⁡                                  (                                      t                    +                    T                                    )                                                      )                          =                              1            2                    ×                      (                                          Or                ⁡                                  (                                      t                    -                    T                                    )                                            +                              Or                ⁡                                  (                                      t                    +                    T                                    )                                                      )                                              (        2        )            
Stated differently, the value of each pixel of the interpolated image In(t) is equal to the weighted (mean) sum of the value of the corresponding pixels of the original images Or(t−T) and Or(t+T). A performance of such a function is good provided that the motion of the elements in the video is slight. By contrast, if the motion is considerable, the correlation between the original images Or(t−T) and Or(t+T) may be insufficient to obtain a good interpolation result.
This is why it has already been proposed to use an algorithm for image interpolation with motion compensation, based on motion vectors which are received in coded form with the data of the original images transmitted. Reference may be made, for example, to the article by Y. K. Chen, A. Vetro, H. Sun and S. Y. Kung, “Frame-Rate Up-Conversion Using Transmitted True Motion Vectors”, IEEE Workshop on Multimedia Signal Processing”, 1998. In this case, the interpolation function produces the value of the pixels of the interpolated image In(t) from the value of the pixels of one or both of the original images Or(t−T) and Or(t+T) received and decoded and also from the motion vectors received and decoded.
The effect of such an interpolation algorithm with motion compensation is illustrated by the diagram in FIG. 1D.
The drawback with the algorithm described in the aforementioned article is that it depends significantly on the video coding standard, as regards the definition and the mode of estimating the motion vectors. This may, for example, be the MPEG-4 standard. Another drawback of such an algorithm is that it applies only to videos produced according to a coding standard based on estimation and/or compensation of the motion of the image and by means of which coding standard motion vectors are generated. However, all coding standards are not based on such a principle. For example, wavelet coding methodology does not provide for such a motion estimation.
Alternatively, it would be conceivable to use an image interpolation technique with motion compensation which comprises an estimation of the motion vectors only from the original images as received and decoded. This estimation may comprise the full searching for each pixel block of a first decoded original image in a second decoded original image or else in part of the latter, called a search window (a technique called full search).
However, this method of estimation requires a large number of calculations. It is because of this requirement why this method is rarely applied to portable video devices. The calculations are performed by a computer of the portable video device or telecommunication terminal, which consumes more power the larger the number of calculations to be performed. However, since the terminal is supplied by an autonomous power supply such as a rechargeable battery, it is necessary to limit the power consumption in order not to reduce the autonomy of the equipment.
Accordingly, what is needed is a method and device to overcome the shortcomings and problems with the prior art and to provide estimation of motion vectors, which requires fewer calculations.