A digital video is composed of frames. Each frame is a snapshot at a time instance. For communications of video, if the video is transmitted frame by frame, this involves a large amount of data and takes a long time.
Since each of the neighboring frames is likely to be a snapshot of a scene with moving objects, they share a lot of similarities. If a frame can be reconstructed from its neighboring frames at the decoder side without having itself transmitted, less data is required.
In order to reconstruct a frame from its neighboring frames, what is required in general is the difference between the frame and its neighboring frames. In other words, the motion of what is snapshot in the frame. The possible motion includes translation, zooming, rotation, and transform. Such a difference or such a motion is represented by motion vectors. The process to determine such a difference or such a motion is known to be motion estimation. The reconstruction based on the motion estimation is known to be motion compensated prediction.
The basic element of a frame is a pixel. The motion is estimated in a scale of pixels. For example, an object in the scene moves to the left by one pixel. However, it is likely that the motion is in a scale of subpixels, so there is subpixel motion estimation which provides the accuracy of motion estimation up to subpixel level.