High-efficiency video coding (HEVC) is a block-based hybrid spatial and temporal predictive coding scheme. Similar to other video coding standards, such as motion picture experts group (MPEG)-1, MPEG-2, and MPEG-4, HEVC supports intra-picture, such as I picture, and inter-picture, such as B picture. In HEVC, P and B pictures are consolidated into a general B picture that can be used as a reference picture.
Intra-picture is coded without referring to any other pictures. Thus, only spatial prediction is allowed for a coding unit (CU)/prediction unit (PU) inside an intra-picture. Inter-picture, however, supports both intra- and inter-prediction. A CU/PU in an inter-picture may be either spatially or temporally predictive coded. Temporal predictive coding may reference pictures that were previously coded.
Temporal motion prediction is an effective method to increase the coding efficiency and provides high compression. HEVC uses a translational model for motion prediction. According to the translational model, a prediction signal for a given block in a current picture is generated from a corresponding block in a reference picture. The coordinates of the reference block are given by a motion vector that describes the translational motion along horizontal (x) and vertical (y) directions that would be added/subtracted to/from the coordinates of the current block. A decoder needs the motion vector to decode the compressed video.
The pixels in the reference frame are used as the prediction. In one example, the motion may be captured in integer pixels. However, not all objects move with the spacing of integer pixels (also referred to as pel). For example, since an object motion is completely unrelated to the sampling grid, sometimes the object motion is more like sub-pixel (fractional) motion than a full-pel one. Thus, HEVC allows for motion vectors with sub-pixel accuracy.
In order to estimate and compensate sub-pixel displacements, the image signal on these sub-pixel positions is generated by an interpolation process. In HEVC, sub-pixel interpolation is performed using finite impulse response (FIR) filters. Generally, the filter may have 8 taps to determine the sub-pixel values for sub-pixel positions, such as half-pel and quarter-pel positions. The taps of an interpolation filter weight the integer pixels with coefficient values to generate the sub-pixel signals. Different coefficients may produce different compression performance in signal distortion and noise.
FIG. 1 depicts positions of half-pel and fractional-pel (e.g., quarter-pel) pixels between full-pel pixels along a pixel line within an image according to one embodiment. For example, the pixel line may be along a row or column on an image. Multiple interpolation calculations may be made along different rows and columns of an image. Full-pel pixels are represented by integer pixels and are shown in FIG. 1 as pixels L3, L2, L1, L0, R0, R1, R2, and R3. H is a half-pel pixel between full-pel pixels L0 and R0. FL is a sub-pixel pixel (fractional-pel pixel) between full-pel pixels L0 and H and FR is a sub-pixel pixel between half-pel pixel H and full-pel pixel R0.
The fractional-pel and half-pel pixels may be interpolated using the values of spatial neighboring full-pel pixels. For example, the half-pel pixel H may be interpolated using the values of full-pel pixels L3, L2, L1, L0, R0, R1, R2, and R3. Different coefficients may also be used to weight the values of the neighboring pixels and provide different characteristics of filtering.
A uniform sub-pixel spacing may be used. For example, sub-pixel phase offsets are allowed that correspond to quarter, half and three quarter pixel offsets. FIG. 2 is an example of a fixed, uniform, four position sub-pixel motion vector grid, FIG. 3 is an example of a fixed, uniform, eight position sub-pixel motion vector grid, and FIG. 4 is an example of a fixed, uniform, sixteen position sub-pixel motion vector grid. In these three examples, L0 and R0 are the integer pixels and the pixels between L0 and R0 are fractional-pixels.
A motion vector (MV) is a two-dimensional vector (MVX, MVY) that is used for inter prediction that provides an offset from the coordinates in the decoded picture to the coordinates in a reference picture. The motion vector may be represented by integer numbers, but the accuracy may be at quarter-pel resolution. That is, if one component of the motion vector (either MVX or MVY) has a remainder of “0” when dividing by 4, it is an integer-pel motion vector component; if one component of the motion vector has a remainder of “1” when dividing by 4, it is a quarter-pel motion vector component; if one component of the motion vector has a remainder of “2” when dividing by 4, it is a half-pel motion vector component; and if one component of the motion vector has a remainder of “3” when dividing by 4, it is a three-quarter-pel motion vector component.
Motion vectors are predictively coded with predictors chosen from motion vectors of spatial neighboring blocks and/or temporal collocated blocks. The motion vectors of these spatial neighboring blocks and temporal collocated blocks may point to different reference pictures that have a different temporal distance from the reference picture of a current block. To have the motion vector of the spatial neighboring blocks and temporal collocated blocks point to the reference picture of the current block, motion vector scaling is used to scale the motion vector to point to the reference picture of the current block. The scaling uses the differences in temporal distance.
On a uniform motion vector grid, scaling of the motion vector may be very close to scaling of the corresponding motion offset. For example, the motion vector scaling is performed according to temporal distance between the current picture and the reference pictures. Given a current block in a current picture, the motion vector scaling theoretically could be performed as:MVPscaled=(TDref×MVP)/TDP  (1)where MVP is the motion vector predictor for the current block, TDref is the temporal distance between the current picture and the reference picture for the current block, and TDP is the temporal distance between the picture where the motion vector predictor MVP resides and the reference picture that MVP points to.
If infinite precision is allowed for motion vectors MVP and MVPscaled, the above equation is accurate. However, if the precision is only at quarter-pel, a good approximation is necessary. For example, assuming in one example, a motion vector component has a value 1 on a four position sub-pixel motion vector grid, and temporal distances TDref and TDP are equal to 4 and 1, respectively. By using the scaling equation (1), the motion vector component of value 1 is scaled to:MVPscaled=(TDref×MVP)/TDP=(4×1)/1=4On a four position sub-pixel motion vector grid, a motion vector component of value 4 means a motion offset of 1 pel. On uniform four position sub-pixel motion vector grid (FIG. 2), a motion vector component of value 1 represents a motion offset component of ¼ pel. Using the same scaling equation, the motion offset component of ¼ pel is scaled toMVPscaled=(TDref×MVP)/TDP=(4×(¼))/1=1(pel)As seen, for this example, scaling of motion vector component exactly matches scaling of motion offset as both give a motion offset of 1 pel. However, the problem with uniform distribution of sub-sample positions is that it may not be the optimal for a given set of filter restrictions, such as number of taps or the power spectral density of the reference block.