Video encoder designs dedicate large die areas to motion search circuitry as motion search is critical to temporal compression and a required function to achieve high compression ratios. Motion search is normally the most computational intensive component of a video encoder and can occupy 50% or more of the die area of a video encoder integrated circuit. FIG. 1 shows two frames of an image sequence. New or current frame 10 is to be compressed. Compression is accomplished by first dividing the frame into blocks and then searching for features of each block in previous reference frame 11. The purpose of the search algorithm is to find the best matching block in the reference frame that has the same pixel features 15 in the reference frame as the block in current frame 12.
There are a number of algorithms used to search for the best matched image. The most common algorithm is known as the sum of absolute differences (SAD) algorithm. The instance of the SAD algorithm shown below is for a k×k pixel block at location (x, y) of the new subject frame A that is compared with the reference frame B. The comparison is done at a vector (r, s) from the block being searched on subject frame A. The SAD algorithm sums the absolute difference of each pixel pair.
                              SAD          ⁡                      (                          x              ,              y              ,              r              ,              s                        )                          =                              ∑                          i              =              0                                      k              -              1                                ⁢                                          ⁢                                    ∑                              j                =                0                                            k                -                1                                      ⁢                                                  ⁢                                                                          A                                      (                                                                  x                        +                        i                                            ,                                              y                        +                        j                                                              )                                                  -                                  B                                                            (                                                                        (                                                      x                            +                            r                                                    )                                                +                        i                                            )                                        ,                                          (                                                                        (                                                      y                            +                            s                                                    )                                                +                        j                                            )                                                                                                                                            (        1        )            
In a thorough search, the SAD algorithm is executed for each pixel or sub-pixel location in the area to be searched. In an example where the search covers +/−16 pixels, the equation is repeated 33 quantity squared (1089) times.
FIG. 2 illustrates a graph of the results from the SAD algorithm as achieved for the search for a block on the subject frame 12 over a search area on the reference frame 13 for two frames of the well known “foreman” video test sequence. The graph is inverted showing point of minimum error 20 at the highest point of the contour and identifies the location of the search area on the reference frame that best correlates with the subject block. The center of the search 22 identifies areas of the reference frame that are poorly correlated with the subject block.
A common approach for implementing the SAD equation is shown in FIGS. 3 and 4. FIG. 3 shows an inverter used to determine an absolute value of the difference between two pixels. The pixel values A In 30 and B In 31 are unsigned 8-bit values that need to be compared. The smaller of the two input values is inverted using Exor blocks 32 and 33, generating a 1's compliment negative number. When added to the other number and disregarding the carry, a positive or absolute result is generated. The logic diagram shown is simplified in two ways. Rather than using a full 8-bit adder, a single carry out 34 is generated. Secondly, a 1's compliment number is created. This is corrected later in the circuit by adding 1 to the result.
FIG. 4 shows the full 8×8 pixel SAD circuit. Inverter 40 (FIG. 3) is repeated 64 times, generating 128 8-bit values that are summed together. The 1's compliment correction is added as a constant (reference numeral 42). Summing circuits 44 and 46 are implemented as adder trees to improve performance generating two values that are finally summed together in module 48 to generate a single, 14-bit output value (SAD 49).
An alternative equation used to determine the closest matching block is the sum of squared differences (SSD) shown below for 8×8 block size.
                              SAD          ⁡                      (                          x              ,              y              ,              r              ,              s                        )                          =                              ∑                          i              =              0                        7                    ⁢                                          ⁢                                    ∑                              j                =                0                            7                        ⁢                                                  ⁢                          [                                                A                                      (                                                                  x                        +                        i                                            ,                                              y                        +                        j                                                              )                                                  -                                  B                                                            (                                                                        (                                                      x                            +                            r                                                    )                                                +                        i                                            )                                        ,                                          (                                                                        (                                                      y                            +                            s                                                    )                                                +                        j                                            )                                                                                  ]                                                          (        2        )            
Instead of determining the absolute value of the difference between each pair of pixels, the SSD equation squares the difference providing a positive number proportional to the power of the difference. This equation replaces the 64 inverter circuits 40 of FIG. 4 with 64 8-bit multipliers each generating a 16-bit value. The summing circuit then sums the 64 16-bit values to generate a 22-bit result.
FIG. 5 illustrates a graph of the results from the SSD equation above as implemented for the same image as used for the SAD graph in FIG. 2. In this search, minimum difference 50 is located at the same location as the SAD calculation.
A video encoder motion search system may use alternative search strategies or algorithms to reduce the computational demands of a search or increase the quality of the search and thereby improve the compression ratio. One example of a strategy to reduce the processing time is to deploy a reduced search algorithm that searches only a limited series of search vectors rather than the full search described above. These methods only search selective areas and produce similar results to the full search techniques but use fewer processing resources.
Reduced search algorithms increase the search speed by limiting the coverage but do not reduce the circuitry needed to perform each search. The absolute difference calculation relies on complete difference calculations of pixels in the reference frame and current frame as required for the SAD calculation. Additionally, these fast search algorithms typically only find a local solution over a fixed search range.
An alternative approach to achieving more efficient searches is to reduce the number of pixels used in the comparison. This approach reduces the power consumption of the estimation circuit by reducing the number of calculations in the difference equation but still relies on complex adder circuitry required for the difference equation.
Yet another method combines full block search methods with predictive block matching in what is described as an adaptive cost block matching (ACBM) technique. The ACBM method applies a full search only to limited regions, achieving a similar distortion to full search methods but with up to 95% reduction in computational load. This method characterizes the image by identifying low textured areas suitable for full search block matching while also identifying correlated motion vectors. In one implementation, the motion vectors of a ten frame modeled sequence are compared with those of the original sequence to determine the regions of the image suited to a full-scale search. This method may be suitable for latency-insensitive streaming applications like broadcast video that tolerate a long compression delay but is not suitable for interactive applications that require low-latency encoding. Moreover, the multi-frame buffering requirement is cost prohibitive for high-resolution or large format video applications.
An example of a method that reduces the search circuitry is the truncated pixel search. The truncated pixel search reduces processing resources by comparing a limited resolution for each pixel and reduces the gate count to approximately half of the gates required for a full search method using systolic arrays methods.
This approach is unsuitable for the encoding of some high quality video streams where picture quality is highly dependent on the prediction error and the low-resolution pixel comparison both increases this error and decreases the video quality. To overcome this, an adaptive mask that truncates the number of significant bits in real time based on quality requirements has been proposed. This approach reduces power consumption but increases circuit complexity to support full pixel comparisons when demanded.
In summary, computationally-expensive, full-search methods produce accurate displacement vectors while reduced-search and truncated-pixel methods have been shown to be somewhat more processor-efficient. The dominant factor that determines the compression ratio for an encoding algorithm remains the number of searches executed by the algorithm. The more searches that are performed, either over a large area or at a sub-pixel resolution, the better the resultant compression. Regardless of the method selected, real-time computational performance of one tera operations per second or more is required to accomplish a reasonably thorough search at sub-pixel resolution over a large image. Even at very high clock speeds, a large proportion of the die area must be dedicated to the motion search function. Thus, the greatest challenge still remains the reduction of large die areas or circuitry required to achieve the vast quantity of difference calculations needed to achieve an acceptable compression ratio.