Transmission of moving pictures in real-time is employed in several applications like, e.g., video conferencing, net meetings, TV broadcasting and video telephony.
However, representing moving pictures requires bulk information as digital video typically is described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and cannot be transferred over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission requires data compression to a large extent. Data compression may, however, compromise with picture quality. Therefore, great efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth limited data connections.
The most common video coding method is described in the MPEG* and H.26* standards, all of which using block based prediction from previously encoded and decoded pictures.
The video data undergoes four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the amount of bits required for each picture in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. Since the predictor part is known to both encoder and decoder, only the difference has to be transferred. This difference typically requires much less capacity for its representation. The prediction is mainly based on picture content from previously reconstructed pictures where the location of the content is defined by motion vectors.
In a typical video sequence, the content of a present block M would be similar to a corresponding block in a previously decoded picture. If no changes have occurred since the previously decoded picture, the content of M would be equal to a block of the same location in the previously decoded picture. In other cases, an object in the picture may have been moved so that the content of M is more equal to a block of a different location in the previously decoded picture. Such movements are represented by motion vectors (V). As an example, a motion vector of (3;4) means that the content of M has moved 3 pixels to the left and 4 pixels upwards since the previously decoded picture.
A motion vector associated with a block is determined by executing a motion search. The search is carried out by consecutively comparing the content of the block with blocks in previous pictures of different spatial offsets. The offset relative to the present block associated with the comparison block having the best match compared with the present block, is determined to be the associated motion vector.
In recent video coding standards, the same concept is extended so that motion vectors also can take ½ pixel values. A vector component of 5.5 then implies that the motion is midway between pixels 5 and 6. More specifically the prediction is obtained by taking the average between the pixel representing a motion of 5 and the pixel representing a motion of 6. This is called a 2-tap filter due to the operation on 2 pixels to obtain prediction of a pixel in between. Motion vectors of this kind are often referred to as having fractional pixel resolution or fractional motion vectors. All filter operations can be defined by an impulse response. The operation of averaging 2 pixels can be expressed with an impulse response of (½, ½). Similarly, averaging over 4 pixels implies an impulse response of (¼, ¼, ¼, ¼).
In H.264/AVC, coding methods have improved both in terms of motion resolution and number of pixels used for each interpolation. The methods use motion compensated prediction with up to ¼ and even ⅛ pixel accuracy. An example of integer- and fractional pixel positions are indicated below (for simplicity, interpolations are only shown between A, E, U and Y):
                    A        ″                                                        E        ′                                                A              b              c              d              E                                                  A        ′                                                        E        ″                                                                                                                                        f              g              h              i              j                                                                                                                                                                                                                                                  k              l              m              n              o                                                                                                                                                                                                                                                  p              q              r              s              t                                                                                                                                                                                                                                                  U              v              w              x              Y                                                                                                                      
The positions A E U Y indicate integer pixel positions, and A″, E′, A′ and E″ indicates additional integer positions on the A-E line. c k m o w indicate half pixel positions. The interpolated values in these positions may be obtained by, e.g., using a 6-tap filter with impulse response ( 1/32, − 5/32, 20/32, 20/32, − 5/32, 1/32) operating on integer pixel values. As an example, c is then calculated by the following expression:c= 1/32·A″− 5/32·E′+ 20/32·A+ 20/32·E− 5/32·A′+ 1/32·E″
The filter is operated horizontally or vertically as appropriate.
When a frame of video is encoded into a H.264/AVC bit stream, one of the last steps is usually the half-pixel interpolation filter for preparing for the above-mentioned motion search when coding future frames. This step is one of the most computationally demanding tasks in the encoding process, and involves filtering the entire frame. As the picture resolution increases, this will require a considerable amount of processor capacity and introduces too much delay, especially if the encoding process is implemented on general purpose shared processors, e.g., processors on personal computers.