Advanced video encoding often uses a three-dimensional transform, with one temporal and two spatial dimensions. Prior to encoding, consecutive video frames are usually divided into groups of pictures (GOP), similar to the GOP structure used in MPEG, with the number of frames per group being constant or flexible, and then analyzed, wherein wavelets are a known analysis technique. Wavelet technique is an iterative method for breaking a signal or a series of values into spectral components, by taking averages and differences of values. Thereby it is possible to view the series of values in different resolutions corresponding to frequencies, or subbands of the spectrum.
The mentioned three-dimensional transform is known as 2D+t subband/wavelet transform along motion trajectories. Such transform is commonly implemented using a Motion Compensated Temporal Filter (MCTF), which filters pairs of video frames and produces a temporal low frame, a temporal high frame and a motion field, i.e. set of motion vectors, between the filtered pair of frames. Thereby, many pixels in one frame can be predicted from pixels of the other frame and their associated motion vector, while the other pixels that cannot be predicted are called “unconnected” and must be separately encoded and transmitted. A decoder generates predicted frames based on previous frames, motion vectors and received data referring to unconnected pixels.
The first step of the described MCTF procedure is the selection of pairs of frames to filter according to a predefined selection scheme. This is called temporal decomposition of the GOP. Known temporal decomposition schemes consider temporally successive pairs of frames, assuming that such frames provide the highest similarity and therefore enable the most effective coding.