High-efficiency video coding (HEVC) is a block-based hybrid spatial and temporal predictive coding scheme. Similar to other video coding standards, such as motion picture experts group (MPEG)-1, MPEG-2, and MPEG-4, HEVC supports intra-picture, such as I picture, and inter-picture, such as B picture. In HEVC, P and B pictures are consolidated into a general B picture that can be used as a reference picture.
Intra-picture is coded without referring to any other pictures. Thus, only spatial prediction is allowed for a coding unit (CU)/prediction unit (PU) inside an intra-picture. Inter-picture, however, supports both intra- and inter-prediction. A CU/PU in an inter-picture may be either spatially or temporally predictive coded. Temporal predictive coding may reference pictures that were previously coded.
Temporal motion prediction is an effective method to increase the coding efficiency and provides high compression. HEVC uses a translational model for motion prediction. According to the translational model, a prediction signal for a given block in a current picture is generated from a corresponding block in a reference picture. The coordinates of the reference block are given by a motion vector that describes the translational motion along horizontal (x) and vertical (y) directions that would be added/subtracted to/from the coordinates of the current block. A decoder needs the motion vector to decode the compressed video.
The pixels in the reference frame are used as the prediction. In one example, the motion may be captured in integer pixels. However, not all objects move with the spacing of integer pixels (also referred to as pel). For example, since an object motion is completely unrelated to the sampling grid, sometimes the object motion is more like sub-pel (fractional) motion than a full-pel one. Thus, HEVC allows for motion vectors with sub-pel accuracy.
In order to estimate and compensate sub-pel displacements, the image signal on these sub-pel positions is generated by an interpolation process. In HEVC, sub-pel interpolation is performed using finite impulse response (FIR) filters. Generally, the filter may have 8 taps to determine the sub-pel values for sub-pel positions, such as half-pel and quarter-pel positions. The taps of an interpolation filter weight the integer pixels with coefficient values to generate the sub-pel signals. Different coefficients may produce different compression performance in signal distortion and noise.
In one example, the coefficients for the filter are fixed and applicable to compression of all sequences. In another example, the filter choice may vary from sequence to sequence, within a sequence, from picture to picture, from reference to reference, or within a picture, from PU to PU. This is referred to as an adaptive interpolation filter (AIF). In both the fixed and adaptive interpolation filter schemes, a phase offset spacing of sub-pel pixel values is uniform. For example, the offsets may correspond to quarter, half, and three quarter pixel offsets. FIG. 1 shows an example of a quarter-pel resolution. As shown, sub-pel pixels between a phase offset 0 and phase offset 1 (the integer pixels) are at the ¼, ½, and ¾ phase offsets.