Prevalent in the panorama of techniques for compression of moving images is the set of methods that implement a compression of a “block-based” type, i.e., of a type based upon blocks of pixels that make up a photograph or a frame of a video image. Belonging to this set are standards such as MPEG-1, MPEG-2, MPEG-4 and the more recent H.264.
The principles underlying the structure of the above techniques are:                reduction of temporal redundancy; and        reduction of spatial redundancy of image sequences (or photographs or frames) to be coded.        
To achieve these two targets, the procedure of digital-image processing involves a long step for estimation of the motion of the individual blocks forming each frame. From extraction of the motion field there derive one or more reference images, which are used to obtain a difference signal, i.e., the part of information useful for coding the frame to be compressed.
The useful information is further optimized by exploiting passage from the space-time domain (luminance and chrominance of the pixels), to the frequency-time domain, by means of the two-dimensional discrete cosine transform (DCT-2D) together with the quantization with a limited number of bits.
The final part of the encoder constructs the bit stream by manipulating the data sequence obtained, using “classic” variable-length coding (VLC) techniques, such as, for example, Huffman coding and the more advanced CAVLC (Context-Adaptive VLC) or CABAC (Context-Adaptive Binary Arithmetic Coding) of the H.264 standard.
Underlying all the techniques in question is the search for the motion field, carried out by means of a technique of so-called “block matching”. Irrespective of the type of procedure (with exhaustive search, recursive search or pyramidal search, whether on a spatial basis or on a temporal basis), the encoder makes a choice for selecting a “predictor” block starting from a group of candidate predictor blocks contained within a pre-defined search area. Coding of a macroblock of a P (predicted) type in the MPEG-2 technique is an example thereof.
We shall now refer to FIG. 1, which can be considered to all effects as a block diagram of an encoder system for signals corresponding to moving images.
A macroblock 10 to be coded (for example, a set of 16×16 pixels) is compared in a comparison block 15 with one or more predictor macroblocks 20 in order to choose the best from among the latter, and then use it as reference for the calculation of the difference signal 30. The problem of motion estimation is thus reduced to the problem of estimation of the best predictor, i.e., estimation of the macroblock 20 that minimizes the difference signal 30 computed between the macroblock to be coded 10 and the predictor macroblock 20 identified.
The term “macroblock” designates the minimum motion-compensation unit (linked to at least one motion vector) irrespective of the specific compression standard used. For clarity and simplicity of illustration, reference is here made—by way of example—to the MPEG-2 standard, where the macroblock unit, of a size of 16×16 pixels, carries motion information, and is in turn split into four blocks (of a size of 8×8 pixels) applied to which is an 8×8-pixel-based DCT-2D.
There exist in the literature a number of measurement indices universally adopted as reference for the choice of the best predictor block.
The most important indices are the following:                mean square error (MSE);        mean absolute error (MAE);        sum of absolute differences (SAD); and sum of absolute differences with addition of a Hadamard transform (SATD).        
The latter two indices are used by the H.264 standard and are present in the coding issued by the Joint Video Team (JVT). Said indices present similar levels of performance, with a higher efficiency of the SATD index in cases of luminance with spectral content concentrated at low frequencies.
Once the measurement index to be used has been defined, the technique for estimating the best predictor is conceptually simple: the measurement index is applied on the current block and on the candidate predictor block (SAD is assumed for simplicity), and the operation is repeated with all the possible candidate blocks indicated by the motion-estimation technique.
For example, with reference to FIG. 1, using an exhaustive search of the motion (full search), all the possible occurrences are tested between the current block 10 (to be coded) and all the blocks 20 contained within a pre-defined search window that identifies a pre-defined area within reference images or the reference images that are contained in a purposely provided memory buffer 50. The number of possible cases also depends upon the precision adopted by the standard for the motion vectors: in MPEG-2 the sub-pixel estimation is stopped at half a pixel, in H.264 it is stopped at one quarter of a pixel (and with some particular modalities of weighted prediction even beyond).
The choice of the best candidates is iterative: whenever a test returns a measurement that is better than the previous ones (for example, a lower value, ideally zero, for the SAD index) the block examined is chosen as “optimal” prediction of the current block.
Once again with reference to FIG. 1, once the predictor 20 has been identified, the difference signal 30 with the current macroblock 10 is calculated, after which, in a step 40, the DCT is applied to obtain the coefficients to be quantized and coded in the output bit stream 60.
The binomial motion estimation and block matching is crucial: a valid motion estimator offers the smallest possible number of “best predictor candidates”, whilst an efficient technique of block matching enables extraction of the best predictor in order to minimize the number of coefficients of the DCT used for coding the difference signal.
In addition, there cannot be any absolute certainty as regards the best choice: considering each block-matching test, the sequence transform—quantization—VLC for counting the number of bits generated within a block is not carried out each time, and even the marked non-linearity of the entropic coding can lead to a high number of output bits for just a few coefficients that rarely occur.
It is highly likely that a smaller number of coefficients of the DCT that are to be coded is transformed into a smaller number of bits inserted in the final bit stream.
Indices such as SAD and SATD are simple to calculate (they contain only additions) and are easy to apply; in addition, they provide a good confidence interval on the block-matching measurement. However, said indices do not present a symmetrical confidence interval: if a very low SAD is certainly an index of effective resemblance between two blocks, a high SAD does not indicate with equal certainty the total extraneity of the data in question. Measurement indices of this type have the purpose of minimizing the error as a whole and do not take into account how the error is distributed at a spatial level in the difference signal.