1. Field
The embodiments relate to motion estimation, and more particularly to a two-dimensional convolution engine also used for motion estimation.
2. Description of the Related Art
Motion estimation (ME) is typically the most computationally demanding part of video compression. Video post-processing, such as motion-compensated filtering and deinterlacing, require reliable ME. One of the most widely used algorithms for ME is block matching, by which rectangular windows, for example N×N blocks, are matched against a reference frame (or field). The matching criterion is usually the sum of absolute errors for a particular displacement (m,n), defined as
            SAE      ⁡              (                  m          ,          n                )              =                            ∑                      k            ⁢                                                  ⁢            1                    N                ⁢                              ∑                          k              ⁢                                                          ⁢              2                        N                    ⁢                      t            ⁡                          (                                                k                  ⁢                                                                          ⁢                  1                                ,                                  k                  ⁢                                                                          ⁢                  2                                            )                                          -              w        ⁡                  (                                                    k                ⁢                                                                  ⁢                1                            -              m                        ,                                          k                ⁢                                                                  ⁢                2                            -              n                                )                      ,      0    ≤          (                        k          ⁢                                          ⁢          1                ,                  k          ⁢                                          ⁢          2                    )        ≤    N  where t and w are the target and window (reference) frames respectively. Video encoders or processors typically have specialized accelerators that compute the SAE very quickly. If the search area is an LH×LV region, the engine finds the (m,n) displacement pair with minimum SAE within that region. Matching may also be performed with a mean-squared error criterion for the N×N block of pixels, defined as
            MSE      ⁡              (                  m          ,          n                )              =                  1        /                  N          2                    ⁢                        ∑                      k            ⁢                                                  ⁢            1                    N                ⁢                              ∑                          k              ⁢                                                          ⁢              2                        N                    ⁢                                    [                                                t                  ⁡                                      (                                                                  k                        ⁢                                                                                                  ⁢                        1                                            ,                                              k                        ⁢                                                                                                  ⁢                        2                                                              )                                                  -                                  w                  ⁡                                      (                                                                                            k                          ⁢                                                                                                          ⁢                          1                                                -                        m                                            ,                                                                        k                          ⁢                                                                                                          ⁢                          2                                                -                        n                                                              )                                                              ]                        2                                ,          ⁢      0    ≤          (                        k          ⁢                                          ⁢          1                ,                  k          ⁢                                          ⁢          2                    )        ≤    N  but this is more computationally complex due to the squaring operation.
Other stages of encoding or post-processing also require two-dimensional (2-D) convolution for spatial filtering for noise reduction, or for band-limiting prior to decimation. These filters also typically require dedicated hardware optimized for high-performance operation. The 2-D filter or convolver usually includes a bank of multipliers with filter coefficients and a memory buffer for data.