1. Field of the Invention
The present disclosure relates to advances in areas including video systems, computer program product, and methods, and in particular to video compression/decompression in digital video systems, software-enabled devices and methods.
2. Description of the Related Art
The “background” description provided herein is for the purpose of generally presenting the context of the disclosure. Insight provided by the present inventors, to the extent it is described in this background section, as well as aspects of the description which may not otherwise qualify as prior art at the time of filing, are neither expressly or impliedly admitted as prior art.
Transmission of moving pictures in real-time is employed in numerous applications such as video conferencing, “net meetings”, television (TV) broadcasting and video telephony.
However, representing moving pictures involves bulk information, in digital form, and is described by representing each picture element (pixel) in a picture (or image frame) with 8 bits (1 Byte). Aggregation of uncompressed video data results in very large bit quantities, and as a consequence demands large bandwidth allocation for transmission over conventional communication networks in real time due to limited bandwidth.
Due to significant reduncancy in images between successive frames, data compression is freely applied in real time video transmission applications. Data compression may, however, compromise picture quality and so persistent efforts continue to be made to develop data compression techniques allowing real time transmission of high quality video over bandwidth limited resources.
In video compression systems, an objective is to represent the video information with as little “capacity” as possible, where capacity is usually measured in bits, either as a constant value or as bits/time unit. By minimizing bits, the amount of bits that need to be transmitted is reduced, and therefore, the amount of communication resources needed to support the real time transmission of video data is also reduced.
The most common video coding methods are described in the MPEG* (e.g., MPEG 2 and MPEG 3) and H.26* (e.g., H.263 and H.264) standards. According to these standards, the video data is exposed to four main processes before transmission, namely prediction, transformation, quantization and entropy coding.
The prediction process significantly reduces the number of bits required for each frame in a video sequence to be transferred. It takes advantage of the similarity of parts of the sequence with other parts of the sequence. A decoder that decodes the bit stream has side information to assist in the decoding process. This side information is known to both encoder and decoder and so only the difference has to be transferred. This difference typically requires much less capacity for its representation than the full image. The motion estimation aspect of the prediction is mainly based on picture content from previously reconstructed pictures where the location of the content is defined by motion vectors. The prediction process is typically performed on square block sizes (e.g. 16×16 pixels), although the size of the blocks may vary.
In a typical video sequence, the content of a present block “M” would be similar to a corresponding block in a previously decoded picture. If no changes have occurred since the previously decoded picture (i.e., an image in a new frame is the same as the image in the past frame), the content of M would be equal to a block of the same location in the previously decoded picture. In other cases, an object in the picture may have been moved between frames so that the content of M is more similar to a block of a different location in the previously decoded picture. Such movements are represented by a motion vector (V). As an example, a motion vector of (3;4) means that the content of M has moved 3 pixels to the left and 4 pixels upwards since the previously decoded picture. For improved accuracy, the vector may also include decimals, requiring interpolation between the pixels.
In H.262, H.263, MPEG1, MPEG2 the same concept is extended so that motion vectors also can take ½ pixel values. A vector component of 5.5 then implies that the motion is relative to the midway point between 5 and 6 pixels. More specifically the prediction is obtained by taking the average between the pixel representing a motion of 5 and the pixel representing a motion of 6. This prediction is conventionally performed with a 2-tap filter due to the operation on 2 pixels to obtain prediction of a pixel between the two. Since filter operations can be defined by an impulse response, the operation of averaging 2 pixels can be expressed with an impulse response of (¼, ½). Similarly, averaging over 4 pixels implies an impulse response of (¼,¼,¼,¼).
The purpose of the averaging is to define a motion of the picture content with an accuracy of ½ pixel, which provides improved coding efficiency as compared with encoders that only operate on integer pixels.
In the MPEG4 and H.264/AVC, coding methods have improved both in terms of motion resolution and number of pixels for each interpolation. The methods use motion compensated prediction with ¼ pixel accuracy. Even ⅛ pixel accuracy is defined, but not included in any profile.
The integer- and fractional pixel positions are indicated in FIG. 1 (for simplicity, interpolations are only shown between pixels A and E). The positions A E U Y indicate integer pixel positions, and A″, E′, A′ and E″ indicate additional integer positions on the A-E line. c k m o w indicate half pixel positions. The interpolated values in these positions are obtained by using a 6-tap filter with impulse response ( 1/32, − 5/32, 20/32, 20/32, −5/32, 1/32) operating on integer pixel values. As an example, c is calculated by the following expression, which represents a filter:c= 1/32·A″− 5/32·E′+ 20/32·A+ 20/32·E− 5/32·A′+ 1/32·E″
The filter is operated horizontally or vertically as appropriate. Further, to obtain the value for m, the filter is not operated on integer values, but on already interpolated values in the other direction. The remaining positions are obtained by averaging respective integer- and half neighbor pixel positions:b=(A+c)/2, d=(c+E)/2, f=(A+k)/2, g=(c+k)/2, h=(c+m)/2, i=(c+o)/2, j=(E+o)/2l=(k+m)/2, n=(m+o)/2, p=(U+k)/2, q=(k+w)/2, r=(m+w)/2, s=(w+o)/2, t=(Y+o)/2.
FIG. 2 is a flowchart of a conventional motion estimation process involving sub-pixel interpolation. The process begins at step S1 where a multi-tap filter is used to perform half-pixel interpolation with two pixels. The interpolated half-pixels are stored in a buffer memory in step S2. The process then proceeds to step S3 where quarter pixel interpolation is performed by averaging respective integer- and half pixel positions. Subsequently the interpolated quarter pixels are stored in step S4. Then in step S5 a query is made regarding whether all the pixels have been analyzed. If the response to the query in step S5 is affirmative, the process proceeds to step S6 where a best pixel or sub-pixel is selected. In this context a “best pixel or sub-pixel” is one that yields a lowest cost (measured in bits). If the response to the query in step S5 is negative, the process returns to step S1 where other pixels in a candidate block are analyzed and steps S1 through S4 are performed. Once the best pixel or partial pixel is selected in step S6 as being the best pixel or sub-pixel identifying the motion vector from one frame to the next, the process proceeds to step S7 where the calculated and stored interpolated half and quarter pixel values are over written by the next frame. Subsequently the process stops.