This invention relates to the field of video image processing, and, more particularly, to video coders compliant with the MPEG-2 standard.
The concept of motion estimation is that a set of pixels of a field of a picture may be placed in a position of the subsequent picture obtained by translating the preceding one. These transpositions of objects may expose to the video camera parts that were not visible before as well as changes of their shape, e.g., zooming.
The family of algorithms suitable to identify and associate these portions of images is generally referred to as motion estimation. Such an association permits calculation of the portion of a difference image by removing the redundant temporal information making more effective the subsequent process of compression by DCT, quantization and entropic coding. A typical example of such a method is found in the MPEG-2 standard. A typical block diagram of a video MPEG-2 coder is depicted in FIG. 1. Such a system is made up of the following functional blocks.
Field ordinator. This block includes one or more field memories outputting the fields in the coding order required by the MPEG-2 standard. For example, if the input sequence is I B B P B B P etc., the output order will be I P B B P B B . . . . The intra-coded picture I is a field and/or a semifield containing temporal redundance. The predicted-picture P is a field and/or a semifield from which the temporal redundance with respect to the preceding I or P (previously co-decoded) picture has been removed. The biredictionally predicted-picture B is a field and/or a semifield whose temporal redundance with respect to the preceding I and subsequent P (or preceding P and successive P) picture has been removed. In both cases, the I and P pictures must be considered as already co/decoded. Each frame buffer in the format 4:2:0 occupies the following memory space:                               standard          ⁢                      xe2x80x83                    ⁢          PAL                                                  720            xc3x97            576            xc3x97            8            ⁢                          xe2x80x83                        ⁢            for            ⁢                          xe2x80x83                        ⁢            the            ⁢                          xe2x80x83                        ⁢            luminance            ⁢                          xe2x80x83                        ⁢                          (              Y              )                                ⁢                      xe2x80x83                                      =                              3          ⁢                      ,                    ⁢          317          ⁢                      ,                    ⁢          760          ⁢                      xe2x80x83                    ⁢          bit                                              xe2x80x83                                      360          xc3x97          288          xc3x97          8          ⁢                      xe2x80x83                    ⁢          for          ⁢                      xe2x80x83                    ⁢          the          ⁢                      xe2x80x83                    ⁢          chrominance          ⁢                      xe2x80x83                    ⁢                      (            U            )                                      =                                          xe2x80x83                    ⁢                      829            ⁢                          ,                        ⁢            440            ⁢                          xe2x80x83                        ⁢            bit                                                        xe2x80x83                                      360          xc3x97          288          xc3x97          8          ⁢                      xe2x80x83                    ⁢          for          ⁢                      xe2x80x83                    ⁢          the          ⁢                      xe2x80x83                    ⁢          chrominance          ⁢                      xe2x80x83                    ⁢                      (            V            )                                      =                                                        xe2x80x83                        ⁢                          829              ⁢                              ,                            ⁢              440              ⁢                              xe2x80x83                            ⁢              bit                                _                                              xe2x80x83                                                                total              ⁢                              xe2x80x83                            ⁢              Y                        +            U            +            V                    ⁢                      xe2x80x83                                      =                              4          ⁢                      ,                    ⁢          976          ⁢                      ,                    ⁢          640          ⁢                      xe2x80x83                    ⁢          bit                                                  standard          ⁢                      xe2x80x83                    ⁢          NTSC                                                  720            xc3x97            480            xc3x97            8            ⁢                          xe2x80x83                        ⁢            for            ⁢                          xe2x80x83                        ⁢            the            ⁢                          xe2x80x83                        ⁢            luminance            ⁢                          xe2x80x83                        ⁢                          (              Y              )                                ⁢                      xe2x80x83                                      =                              2          ⁢                      ,                    ⁢          764          ⁢                      ,                    ⁢          800          ⁢                      xe2x80x83                    ⁢          bit                                              xe2x80x83                                      360          xc3x97          240          xc3x97          8          ⁢                      xe2x80x83                    ⁢          for          ⁢                      xe2x80x83                    ⁢          the          ⁢                      xe2x80x83                    ⁢          chrominance          ⁢                      xe2x80x83                    ⁢                      (            U            )                                      =                                          xe2x80x83                    ⁢                      691            ⁢                          ,                        ⁢            200            ⁢                          xe2x80x83                        ⁢            bit                                                        xe2x80x83                                      360          xc3x97          240          xc3x97          8          ⁢                      xe2x80x83                    ⁢          for          ⁢                      xe2x80x83                    ⁢          the          ⁢                      xe2x80x83                    ⁢          chrominance          ⁢                      xe2x80x83                    ⁢                      (            V            )                                      =                                                        xe2x80x83                        ⁢                          691              ⁢                              ,                            ⁢              000              ⁢                              xe2x80x83                            ⁢              bit                                _                                              xe2x80x83                                                                total              ⁢                              xe2x80x83                            ⁢              Y                        +            U            +            V                    ⁢                      xe2x80x83                                      =                              4          ⁢                      ,                    ⁢          147          ⁢                      ,                    ⁢          200          ⁢                      xe2x80x83                    ⁢          bit                    
Motion Estimator. This block removes the temporal redundance from the P and B pictures.
DCT. This block implements the cosine-discrete transform according to the MPEG-2 standard. The I picture and the error pictures P and B are divided in 8*8 blocks of pixels Y, U, V on which the DCT transform is performed.
Quantizer Q. An 8*8 block resulting from the DCT transform is then divided by a quantizing matrix to reduce the magnitude of the DCT coefficients. In such a case, the information associated to the highest frequencies less visible to human sight tends to be removed. The result is reordered and sent to the successive block.
Variable Length Coding (VLC). The codification words output from the quantizer tend to contain a large number of null coefficients, followed by nonnull values. The null values preceding the first nonnull value are counted, and the count figure forms the first portion of a codification word. The second portion represents the nonnull coefficient. These paired values tend to assume values more probable than others. The most probable ones are coded with relatively short words composed of 2, 3 or 4 bits. The least probable ones are coded with longer words. Statistically, the number of output bits is less than in the case such methods are not implemented.
Multiplexer and Buffer. Data generated by the variable length coder, the quantizing matrices, the motion vectors and other syntactic elements are assembled for constructing the final syntax examined by the MPEG-2 standard. The resulting bitstream is stored in a memory buffer. The limit size of which is defined by the MPEG-2 standard and cannot be overfilled. The quantizer block Q respects such a limit by making the division of the DCT 8*8 blocks dependent upon the filling limit of such a memory buffer, and on the energy of the 8*8 source block taken upstream of the motion estimation and the DCT transform process.
Inverse Variable Length Coding (I-VLC). The variable length coding functions specified above are executed in an inverse order.
Inverse Quantization (IQ). The words output by the I-VLC block are reordered in the 8*8 block structure, which is multiplied by the same quantizing matrix that was used for its preceding coding.
Inverse DCT (I-DCT). The DCT transform function is inverted and applied to the 8*8 block output by the inverse quantization process. This permits passing from the domain of spatial frequencies to the pixel domain.
Motion Compensation and Storage. At the output of the I-DCT block the following may alternatively be present. A decoded I picture or semipicture that must be stored in a respective memory buffer for removing the temporal redundance with respect to subsequent P and B pictures. A decoded prediction error picture or semipicture P or B that must be summed to the information removed previously during the motion estimation phase. In case of a P picture, such a resulting sum stored in a dedicated memory buffer is used during the motion estimation process for the successive P pictures and B pictures. These field memories are generally distinct from the field memories that are used for re-arranging the blocks.
Display Unit. This unit converts the pictures from the format 4:2:0 to the format 4:2:2, and generates the interlaced format for displaying the images. Arrangement of the functional blocks depicted in FIG. 1 into an architecture implementing the above-described coder is shown in FIG. 2. A distinctive feature is that the field ordinator block, the motion compensation and storage block for storing the already reconstructed P and I pictures, and the multiplexer and buffer block for storing the bitstream produced by the MPEG-2 coding are integrated in memory devices external to the integrated circuit of the core of the coder. The decoder accesses the memory devices through a single interface suitably managed by an integrated controller.
Moreover, the preprocessing block converts the received images from the format 4:2:2 to the format 4:2:0 by filtering and subsampling the chrominance. The post-processing block implements a reverse function during the decoding and displaying phase of the images.
The coding phase also uses the decoding for generating the reference pictures to make operative the motion estimation. For example, the first I picture is coded, then decoded, stored as described in the motion compensation and storage block, and used for calculating the prediction error that will be used to code the subsequent P and B pictures. The play-back phase of the data stream previously generated by the coding process uses only the inverse functional blocks I-VLC, I-Q, I-DCT, etc., never the direct functional blocks. From this point of view, it may be said that the coding and the decoding implemented for the subsequent displaying of the images are nonconcurrent processes within the integrated architecture.
A description of the exhaustive search motion estimator is provided in the following paragraphs. The P field or semifield is first addressed. Two fields of a picture are considered and the same applies to the semifields. Q1 at the instant t, and the subsequent field Q2 at the instant t+(kp)*T are considered. The constant kp is dependant on the number of B fields existing between the preceding I and the subsequent P, or between two Ps. T is the field period which is {fraction (1/25)} sec. for the PAL standard and {fraction (1/30)} sec. for the NTSC standard. Q1 and Q2 are formed by luminance and chrominance components. The motion estimation is applied only to the most energetic, and therefore richer of information component, such as the luminance, which is representable as a matrix of N lines and M columns. Q1 and Q2 are divided in portions called macroblocks, each of R lines and S columns.
The results of the divisions N/R and M/S must be two integer numbers, but not necessarily equal to each other. Mb2 (i,j) is a macroblock defined as the reference macroblock belonging to the field Q2 and whose first pixel, in the top left part thereof is at the intersection between the i-th line and the j-th column. The pair (i,j) is characterized by the fact that i and j are integer multiples of R and S, respectively. FIG. 2b shows how the reference macroblock is positioned on the Q2 picture while the horizontal dash line arrows indicate the scanning order used for identifying the various macroblocks on Q2. MB2 (i,j) is projected on the Q1 field to obtain MB1 (i,j). On Q1, a search window is defined having its center at (i,j) and composed of the macroblocks MBk[e,f], where k is the macroblock index. The k-th macroblock is identified by the coordinates (e,f), such that xe2x88x92p less than =(exe2x88x92i) less than =+p and xe2x88x92q less than =(fxe2x88x92j) less than =+q. The indices e and f are integer numbers.
Each of the macroblocks are said to be a predictor of MB2 (i,j). For example, if p=32 and q=48, the number of predictors is (2p+1)*(2q+1)=6,305. For each predictor, the norm L1 with respect to the reference macroblock is calculated. Such a norm is equal to the sum of the absolute values of the differences between common pixels belonging to MB2 (i,j) and to MBk (e,f). Each sum contributes R*S values, the result of which is called distortion. Therefore, (2p+1)*(2q+1) values of distortion are obtained, among which the minimum value is chosen, thus identifying a prevailing position (exe2x80xa2,fxe2x80xa2).
The motion estimation process is not yet terminated because in the vicinity of the prevailing position, a grid of pixels is created for interpolating those that form Q1. For example, if Q1 is composed of:
. . .
p31 p32 p33 p34 p35 . . .
p41 p42 p43 p44 p45 . . .
. . .
After interpolation, the following is obtained:             p31              11              p32              …                  12              13              14              …                  p41              15              p42              …      
where 11=(p31+p32)/2
12=(p31+p41)/2
13=(p31+p32+p41+p42)/4
14=(p32+p42)/2
15=(p41+p42)/2
The above noted algorithm is applied in the vicinity of the prevailing position by assuming, for example, p=q=1. In such a case, the number of predictors is equal to 8 and are formed by pixels that are interpolated starting from the pixels of Q1. Let""s identify the predictor with minimum distortion with respect to MB2 (i,j). The predictor more similar to MB2 (i,j) is identified by the coordinates of the prevailing predictor through the above noted two steps of the algorithm. The first step tests only whole positions while the second step tests the sub-pixel positions. The vector formed by the difference components between the position of the prevailing predictor and of MB2 (i,j) is defined as the motion vector, and describes how MB2 (i,j) derives from a translation of a macroblock similar to it in the preceding field. It should be noted that other measures may be used to establish whether two macroblocks are similar. For example, the sum of the quadratic values of the differences (norm L2) may be used. Moreover, the sub-pixel search window may be wider than that specified in the above example. All this further increases the complexity of the motion estimator.
In the example described above, the number of executed operations per pixel is equal to 6,305+8=6,313, wherein each operation includes a difference between two pixels plus an absolute value identification plus a storage of the calculated result between the pair of preceding pixels of the same macroblock. This means that to identify the optimum predictor, there is a need for 6.313*S*R parallel operators at the pixel frequency of 13.5 MHZ. By assuming R=S=16, as defined by the MPEG-2 standard, the number of operations required: is 6,313*16*16=1,616,128. Each operator may function on a time division basis on pixels that belong to different predictors. Therefore, if each of these predictors operated at a frequency 4*13.5=54 MHz, the number of operators required would be 1,616,128/4=404,032.
The B field or semifield is addressed next. Three picture fields are considered, and the same applies also to semifields QPnxe2x88x921 at the instant t, QBkB at the instant t+(kB)*T, and QPn at the instant t+(kp)*T with kP and kB dependant on the number of B fields or semifields preventively selected. T is the field period with {fraction (1/25)} sec. for the PAL standard and {fraction (1/30)} sec. for the NTSC standard. QPnxe2x88x921, QBkB and QPn are formed by luminance and chrominance components. The motion estimation is applied only to the most energetic, and therefore richer of information component, such as the luminance, which is representable as a matrix of N lines and M columns. QPnxe2x88x921, QBkB and Qpn are divided into portions called macroblocks, each of R lines and S columns. The results of the divisions N/R and M/S must be two integer numbers, but not necessarily equal.
MB2 (i,j) is a macroblock defined as the reference macroblock belonging to the field Q2 and whose first pixel, in the top left part thereof, is at the intersection between the i-th line and the j-th-column. The pair (i,j) is characterized by the fact that i and j are integer multiples of R and S, respectively. MB2 (i,j) is projected on the fQPnxe2x88x921 field to obtain MB1 (i,j), and on the Qpn to obtain MB3 (i,j).
On QPnxe2x88x921 a search window is defined with its center at (i,j) and composed of the macroblocks MB1k[e,f], and on Qpn a similar search window whose dimension may also be different, or in any case predefined. This is made up by MB3k[e,f], where k is the macroblock index. The k-th macroblock on the QPnxe2x88x92is identified by the coordinates (e,f), such that xe2x88x92p1 less than =(exe2x88x92i) less than =+p1 and xe2x88x92q1 less than =(fxe2x88x92j) less than =+q1. This is while the k-th macroblock on the QPn field is identified by the coordinates (e,f), such that xe2x88x92p3 less than =(exe2x88x92i) less than =+p3 and xe2x88x92q3 less than =(fxe2x88x92j) less than =+q3. The indexes e and f are integer numbers.
Each of the macroblocks are said to be a predictor of MB2 (i,j). There are in this case two types of predictors for MB2 (i,j). One is on the field that temporally precedes the one containing the block to be estimated (I or P). This is referred to as forward. The second type is those obtained on the field that temporally follows the one containing the block to be estimated (I or P). This is referred to as backward. For example, if p1=16, q=32, p2=8, q2=16, the numbers of predictors is (2p1+1)*(2q1+1)+(2p2+1)*(2q2+1)=2,706.
For each predictor, the norm L1 with respect to the reference macroblock is calculated. Such a norm is equal to the sum of the absolute values of the differences between common pixels belonging to MB1 (i,j), and to MB1k (e,f), or MB3k (e,f). Each sum contributes R*S values, the result of which is called distortion. Hence, we obtain the forward distortion values (2p1+1)*(2q1+1), among which the minimum value is chosen. This identifies a prevailing position (eFxe2x80xa2,fFxe2x80xa2) on the field QPnxe2x88x921, (2p2+1)*(2q2+1) backward distortion values among which the minimum value is again selected identifying a new prevailing position (eBxe2x80xa2,fBxe2x80xa2) on the QPn field.
The motion estimation process is not yet attained because in the vicinity of the prevailing position, a grid of pixels is created to interpolate those that form QPnxe2x88x921and QPn. For example if QPnxe2x88x921 is
. . .
p32 p33 p34 p35 . . .
p42 p43 p44 p45 . . .
. . .
After intertpolation, we have:             p31              11              p32              …                  12              13              14              …                  p41              15              p42              …      
11=(p31+p32)/2
12=(p31+p41)/2
13=(p31+p32+p41+p42)/4
14=(p32+p42)/2
15=(p41+p42)/2
The above noted algorithm is applied in the vicinity of the prevailing position by assuming, for example, p=q=1. In such a case, the number of predictors is equal to 8, and are formed by pixels that are interpolated starting from the pixels of QPnxe2x88x921. The predictor with minimum distortion with respect to MB2 (i,j) is nonidentified. In the same way we proceed for the QPn field. The predictor more similar to MB2 (i,j) on QPnxe2x88x921 and on QPn is identified by the coordinates of the prevailing predictor through the above stated two steps of the algorithm predicted on each field. The first step tests only whole positions while the second the sub-pixel positions. At this point we calculate the mean square errors of the two prevailing predictors (forward and backward). That is, the sums of the square of the differences pixel by pixel between the MB2 (i,j) with (eFxe2x80xa2,fFxe2x80xa2) and with (eBxe2x80xa2,fBxe2x80xa2).
Moreover, the mean square error between MB2 (i,j) is calculated with a theoretical macroblock obtained by linear interpolation of the two prevailing predictors. Among the three values thus obtained, we select the lowest. MB2 (i,j) may be estimated using only (eFxe2x80xa2,fFxe2x80xa2) or just (eBxe2x80xa2,fBxe2x80xa2) or both, though averaged.
The vector formed by the components is a difference between the position of the prevailing predictor and of MB2 (i,j). The vectors are defined as the motion vectors and describe how MB2 (i,j) derives from a translation of a macroblock similar to it in the preceding and/or successive field. In the example described above, the number operations carried out for each pixel is equal to 2,706+8*2=2,722, where each operation includes a difference between two pixels plus an absolute value plus an accumulation of the calculated result between the pair of preceding pixels and comprised in the same macroblock. This means that for identifying the optimum predictor, there is a need for 2,722*R*S parallel operators at the pixel frequency of 13.5 MHz. By assuming R=S=16, as defined by the MPEG-2 standard, the number of operations required is 2,722*16*16=696,832.
Each operator may function on a time division basis on pixels belonging to different predictors. Therefore, if each of them operated at a frequency of 4*13.5=54 MHz, the number of operators required would be 696,832/4=174,208. A high level block diagram of a known motion estimator based on an exhaustive search technique is depicted in FIG. 3, wherein the DEMUX block conveys the data coming from the field memory to the operators. In addition, the MIN block operates on the whole of distortion values for calculating the minimum one.
An object of the present invention is to reduce the complexity of a motion estimator as used, for example, in an MPEG-2 video coder.
As an illustration of an efficient implementation of the method and architecture of the motion estimator of the present invention, a coder for the MPEG-2 standard will be taken into consideration. Using the motion estimator of the invention, it is possible, for example, to use only 6,5 operations per pixel to find the best predictor of the portion of a picture currently being subjected to motion estimation. This is for an SPML compressed video sequence of either PAL or NTSC type. In contrast, the best result that may be obtained with a motion estimator of the prior art would require execution of 569 operations per pixel. This is in addition to the drawback of requiring a more complex architecture.
The method of the invention implies a slight loss of quality of the reconstructed video images for the same compression ratio. Nevertheless, such a degradation of the images is practically undetectable to human sight because the artifaxes are distributed in regions of the images having a substantial motioncontent. The details of which practically pass unnoticed by the viewer.
The following paragraphs provide a description of a hierarchical recursive motion estimator of the invention. The number of operations per pixels required by the coding process may be significantly reduced once the use of vectors calculated by the motion estimation process for macroblocks, spatially and temporally in the vicinity of the current macroblock, are received.
The method herein disclosed is based on the correlation that exists among motion vectors associated to macroblocks in a common position in temporally adjacent images. Moreover, the motion vectors also associated to macroblocks belonging to the same picture, spatially adjacent to the current one, may represent with small errors the motion of the current macroblock.
The process of motion estimation of the invention meets the following requirements. The integration of the required number of operators necessary for implementing the method of motion estimation, together with auxiliary structures such as memories for allowing the reuse of precalculated vectors, must be significantly less burdensome than that of motion estimators that do not include the method of the invention. The loss of quality of the reconstructed images for a given compression ratio must be practically negligible as compared to motion estimators that do not implement the method of the invention.
In the ensuing description of the method for motion estimation, reference is made to a whole fields equal in number to the distance imposed beforehand and equal to M between two subsequent P or I fields. Included is a total number of fields equal to M+2, which will then be taken into consideration, according to the scheme of FIG. 2b. The temporal distance between two successive pictures are equal to a period of a field. In particular, let us assume to have already considered the first QPnxe2x88x921, motion estimation with respect to the preceding (Q0) motion estimation. Its association is also considered to a motion field per macroblock. The motion field is generated by using the same method of the first step, as described below.
With respect to the first step, the prevailing macroblock predictor MBQB (i,j) belonging to the QB1 field is searched on Qpnxe2x88x921. That is, the portion of Qpnxe2x88x921 that more resembles it. The method is applied to all the QB1 macroblocks preceding it following a scanning order from left to right, and from the top to bottom. According to FIG. 2c, mv_MB5 (i, j+S) is the motion vector associated to the macroblock belonging to QPnxe2x88x921 and identified by the coordinates (i, j+S). mv_MB6 (i+R, j) is the motion vector associated to the macroblock belonging to QPnxe2x88x921 and identified by the coordinates (i+R, j). mv_MB3 (i, jxe2x88x92S) is the motion vector associated to the macroblock belonging to QB1 and identified by the coordinates (i, jxe2x88x92S). mv_MB4 (ixe2x88x92R, j) is the motion vector associated to the macroblock belonging to QB1 and identified by the coordinates (ixe2x88x92R, j).
Let us consider, by way of example, to use the above vectors for identifying, during a first phase, four predictors starting from the projection of MBQB1 on Qpnxe2x88x921. The prevailing predictor is identified by using the norm L1 or the norm L2, etc. Generally, it is possible to use more than two predictors belonging to QPnxe2x88x921, and also in a different number from those belonging to QB1. The above noted example is very effective during simulation. The norm associated to the prevailing predictor is thereafter compared with precalculated thresholds derived from statistical considerations. Such thresholds identify three subsets, each composed of F pairs of vectors. Each pair, for example, is composed of vectors having components equal in terms of absolute value, but opposite in sign. In the second step, such F pairs are summed to the vector that represents the prevailing predictor. They also identify other 2*F predictors among which there may also be sub-pixels positions.
The prevailing predictor, in the sense of the norm, is the predictor of MBQB1 (i,j) on Qpnxe2x88x921. This is the difference between their common coordinates associated with the motion vector. The norm is calculated starting from the result obtained by subsampling the macroblock according to a quincux scheme, or by interpolating the pixels of QPnxe2x88x921 for generating predictor macroblocks disposed in sub-pixels positions. The quincux grid is obtained by eliminating a pixel every two from the macroblock according to the following scheme:             source      ⁢              xe2x80x83            ⁢      macroblock        _                      A1                    A2                    A3                    A4                    A5                    A6                    …                            B1                    B2                    B3                    B4                    B5                    B6                    …                            C1                    C2                    C3                    C4                    C5                    C6                    …                        subsampled      ⁢              xe2x80x83            ⁢      macroblock        _                      A1                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            A3                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            A5                              xe2x80x83                                      xe2x80x83                                              xe2x80x83                                      xe2x80x83                            B2                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            B4                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            B6                            C1                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            C3                              xe2x80x83                                      xe2x80x83                                      xe2x80x83                            C5                              xe2x80x83                                      xe2x80x83                    
In this way, the operations necessary for calculating the norm are reduced by 50% compared to the case of an exhaustive search technique of a known motion estimator. The method used for interpolating the pixels of QPnxe2x88x921, thus generating the sub-pixels thereof, is the one used in the exhaustive search estimator of the prior art. The description above for QB1 also applies for the successive fields QB2 . . . QB(Mxe2x88x921). QPn calculates the predictors of each of the respective fields immediately preceding temporally to obtain a motion estimator for each field of the partial sequence considered. The motion estimators must be stored in a suitable structure to enable the second step.
For the second step, the QPn field (type P) is coded, and this requires a spreading of its macroblocks with respect to the QPnxe2x88x921 field positioned at a temporal distance equal to M field periods. To perform this estimation let us consider the MBPn (i,j) block belonging to QPn, where i and j represent the position of the first top left pixel of the above mentioned macroblock with respect to the top left corner of the field it belongs to. It is assumed that all the preceding QPn macroblocks have already been submitted to such a process according to the scanning order.
By referring to FIG. 2d, let us consider the two blocks of coordinates (i, jxe2x88x92S) immediately to the left and above (coordinates (ixe2x88x92R, j)) the block to be estimated MBPn (i,j). Both belong to QPn and have already been submitted to motion estimation. They are therefore associated with two motion vectors which will identify, on QPnxe2x88x921, two spatial predictors macroblocks. Moreover, let us consider the field immediately preceding in a temporal sense the current one. QB(Mxe2x88x921) has been already submitted to motion estimation with respect to its own previous field, QB(Mxe2x88x922). Each of its macroblock has an associated translation vector. A portion of such vectors may be considered to identify, properly scaled in terms of the temporal distance existing between QPnxe2x88x921 and QPn, the new MBPn (i,j). This is referred to as temporal predictors. These predictors are positioned on QPnxe2x88x921.
In particular, the positions identified by the motion vectors associated to the macroblocks as indicated in the figure with T1,2 are tested if the temporal distance to estimate is of one field period.
In this case, only the vectors. associated with T1 having coordinates (i, j+S) and (i+R, j) will be used. Otherwise, those indicated by T2 should also be considered and whose coordinates are (i+R, j+2*S), (i+2*R, j+S), (i+2*R, jxe2x88x92S), (i+R, jxe2x88x922*S). The number of these temporal predictors may also be different from the number indicated. However, this choice is made based on the best experimental results.
Among all the indicated predictors, only one is chosen using the criterion of the norm L1. This norm is then compared with precalculated thresholds derived from statistical considerations. These thresholds identify 3 sub-sets of pairs of vectors, whose components are equal in absolute value, but with opposite signs. The number of such pairs is taken equal to F, and F is the function of the temporal distance to cover by the estimation (F=F(T_dist)). In the second phase, such pairs are added to the vector that identifies the prevailing predictor and identifies other 2*F predictors among which there may be also subpixel positions. The prevailing norm is the predictor of MBPn (i,j) on QPnxe2x88x921, and the difference between their common coordinates identifies the motion vector to it associated.
For example, the number of operations per pixel according to the above described method for the P fields is equal to:
This is followed by the estimation of the B fields. The procedure considers that the estimate is to be carried out both for the P or I field that temporally precedes the one to be estimated. This is with respect to both the I or P field that follows. As for the estimation of the preceding I or P field, the process is similar to that described above. For the estimation of the successive field P or I, there are some differences in using the temporal predictors. In this case, this term is used to identify the motion vectors associated to the macroblocks positioned in the same positions as described above for the temporal predictors of the P fields. They belong to the immediately successive field in a temporal sense to the one to be estimated. Accordingly, they always move in the estimate direction. For example, to estimate QB(Mxe2x88x922) with respect to Qpn, the vectors associated to the QB(Mxe2x88x921) field are used. The latter are calculated during the implementation of the first algorithmic step.
It is necessary that such vectors are symmetrically overturned with. respect to the origin, because they identify the position of a block belonging to a future field as compared to a previous field. It is also necessary to scale them in a proper manner as a function of the temporal distance between the current field and the field to be estimated. At this point, the best backward predictor is chosen between the two spatial and temporal ones, for example, 2 or 6. A certain number of pairs of small vectors symmetrical with respect to the origin are again chosen. Such a number is also a function of the temporal distance to cover. They are chosen within the predefined whole by comparing the norm found with some thresholds as defined by statistical considerations. Such pairs of vectors, added to the prevailing one found above, will identify new predictors among which there may also be sub_pixel positions.
The prevailing norm is the final backward predictor for the block subject to estimation.
Finally, for each macroblock, two predictors are identified. One on the I or P field that temporally precedes QB(Mxe2x88x922), and one on the successive I or P field. A third predictor is also identified and obtained by linear interpolation of the pixels belonging to the above cited predictors. Out of the three predictors one is chosen based on the norm L1. The latter will be the final predictor which is subtracted from the reference block, which is the one submitted to estimation. In this way, the prediction error is obtained.
For example, the number of operations per pixel, according to the above described method, is equal to:
In these conditions, the performance in terms of signal/noise ratio obtained is equivalent to that of the known exhaustive search estimator (see FIG. 3), while the complexity of the hardware implementation is significantly reduced.