High-efficiency video coding (HEVC) is a block-based hybrid spatial and temporal predictive coding scheme. Similar to other video coding standards, such as motion picture experts group (MPEG)-1, MPEG-2, and MPEG-4, HEVC supports intra-picture, such as I picture, and inter-picture, such as B picture. In HEVC, P and B pictures are consolidated into a general B picture that can be used as a reference block.
Intra-picture is coded without referring to any other pictures. Thus, only spatial prediction is allowed for a coding unit (CU)/prediction unit (PU) inside an intra-picture. Inter-picture, however, supports both intra- and inter-prediction. A CU/PU in an inter-picture may be either spatially or temporally predictive coded. Temporal predictive coding may reference blocks that were previously coded.
Temporal motion prediction is an effective method to increase the coding efficiency and provides high compression. HEVC uses a translational model for motion prediction. According to the translational model, a prediction signal for a given block in a current picture is generated from a corresponding block in a reference block. The coordinates of the reference block are given by a motion vector that describes the translational motion along horizontal (x) and vertical (y) directions that would be added/subtracted to/from the coordinates of the current block. A decoder needs the motion vector to decode the compressed video.
The pixels in the reference block are used as the prediction. In one example, the motion may be captured in integer pixels. However, not all objects move with the spacing of integer pixels (also referred to as pel). For example, since an object motion is completely unrelated to the sampling grid, sometimes the object motion is more like sub-pel (fractional) motion than a full-pel one. Thus, HEVC allows for motion vectors with sub-pel accuracy.
In order to estimate and compensate sub-pel displacements, the image signal on these sub-pel positions is generated by an interpolation process. In HEVC, sub-pel interpolation is performed using finite impulse response (FIR) filters. Generally, the filter may have 8 taps to determine the sub-pel values for sub-pel positions, such as half-pel and quarter-pel positions. The taps of an interpolation filter weight the integer pixels with coefficient values to generate the sub-pel signals. Different coefficients may produce different compression performance in signal distortion and noise.
HEVC uses a specific interpolation filter for motion estimation for each reference block based on the choice of sub-pel position for that reference block. In bi-prediction, two reference blocks may be used to predict a current block. One reference block is found on a list 0 and the other reference block is found on a list 1. If the choice of the sub-pel position is a half-pel shift for list 0, then a half-pel interpolation filter is applied to the reference block in list 0. Also, if the choice of the sub-pel position is a quarter-pel shift for list 1, then a quarter-pel interpolation filter is determined for list 1. The same half-pel interpolation filter is applied to the reference block in list 0 even if the sub-pel position for list 1 changes, such as to a half-pel shift.