In modern communication systems a video signal may be sent from one terminal to another over a medium such as a wired and/or wireless network, often a packet-based network such as the Internet. For example the video may be part of a VoIP (voice over Internet Protocol) call conducted from a VoIP client application executed on a user terminal such as a desktop or laptop computer, tablet or smart phone.
Typically the frames of the video are encoded by an encoder at the transmitting terminal in order to compress them for transmission over the network. The encoding for a given frame may comprise intra frame encoding whereby blocks are encoded relative to other blocks in the same frame. In this case a target block is encoded in terms of a difference (the residual) between that block and a neighbouring block. Alternatively the encoding for some frames may comprise inter frame encoding whereby blocks in the target frame are encoded relative to corresponding portions in a preceding frame, typically based on motion prediction. In this case a target block is encoded in terms of a motion vector identifying an offset between the block and the corresponding portion from which it is to be predicted, and a difference (the residual) between the block and the corresponding portion from which it is predicted. A corresponding decoder at the receiver decodes the frames of the received video signal based on the appropriate type of prediction, in order to decompress them for output to a screen at the decoder side.
When encoding (compressing) a video, the motion vectors are used to generate the inter frame prediction for the current frame. The encoder first searches for a similar block (the reference block) in a previous encoded frame that best matches the current block (target block), and signals the displacement between the reference block and target block to the decoder as part of the encoded bitstream. The displacement is typically represented as horizontal and vertical x and y coordinates, and is referred to as the motion vector.
The reference “block” is not in fact constrained to being at an actual block position in the reference frame, i.e. is not restricted to the same grid as the target blocks, but rather it is a correspondingly-sized portion of the reference frame offset relative to the target block's position by the motion vector. According to present standards the motion vectors are represented at fractional pixel resolution. For instance in the H.264 standard each motion vector is represented at ¼ pixel resolution. So by way of example, if a 16×16 block in the current frame is to be predicted from another 16×16 block in the previous frame that is at 1 pixel left of the position of the target block, then the motion vector is (4,0). Or if the target block is to be predicted from a reference block that is only, say, ¾ of a pixel to the left of the target block, the motion vector is (3,0). The reference block at a fractional pixel position does not actually exist per se, but rather it is generated by interpolation between pixels of the reference frame. The sub-pixel motion vectors can achieve significant performance in terms of compression efficiency.