Video encoders compress sequences of video pictures, or frames, by reducing spatial and temporal redundancies. This is done by performing prediction processes in the spatial and/or temporal domains. If the prediction process uses only information in a current picture, it is referred to as intra-prediction, and the picture being encoded is called an I-picture. By contrast, if the prediction process uses correlations between different pictures, it is referred to as inter-prediction. Most encoders support two types of inter-prediction, called P (predicted) prediction and B (bidirectional) prediction. The main difference is that P-prediction predicts the value of a current block based on only one prediction block, while B prediction allows interpolation-based prediction of a current block based on two previously encoded blocks.
A macroblock (MB) is a block of 16×16 pixels. All macroblocks in an I-picture are intra-predicted, while MBs in a P picture may be either P-inter or intra-predicted (whichever is more efficient). Finally, MBs in a B picture are allowed to be either B-inter, P-inter, or intra-predicted.
In video compression, a group of pictures (GOP) specifies the order in which intra- and inter-pictures are arranged. The GOP is a group of successive pictures within a coded video stream. Each coded video stream consists of successive GOPs.
Pictures are encoded based on prediction structures. A prediction structure describes which pictures in a GOP are used to encode a given picture in the GOP and the type of each prediction: I, P, or B. Existing encoding methods use a fixed prediction structure, without taking into account the nature of the picture content. This can result in encoding which is not optimal.