1. Field of the Invention
The present invention relates to digital video, and more specifically, to digital video compression techniques using motion estimation.
2. Description of the Prior Art
Digital video is a popular means of information communication. Computer-based video conferencing and digital television are just two examples of applications of this technology. As a single digital image can contain hundreds of thousands of pixels and many sequential image frames are required to produce a quality video effect, compression schemes are required for efficiency in processing time and storage space.
In general, image compression is effected by determiningcorrelations among regions of pixels. When high compression is required due to limits in data transmission speeds for example, either video quality must be sacrificed or additional compression hardware and software must be provided. Where low amounts of compression are tolerable, video quality can be maintained at the cost of an increase in memory demand. Among this complex arrangement of trade-offs, any increase in compression that does not significantly lower video quality or increase the necessary hardware is undoubtedly beneficial.
Referring to FIG. 1, consider a typical digital video 10 displaying an image 12 with a current frame 14 and a previous frame 16. The current frame 14 is made up of pixels 20 of a video memory or display device such as a computer monitor or digital television set. A set of pixels 20 can be grouped into a current block 18 for the purposes of video compression, which is typically facilitated by motion estimation. Examples of state-of-the-art motion estimation compression schemes are MPEG-1 and MPEG-2, although others are widely known and used.
During motion estimation, temporal redundancies of video frames are used to minimize data repetition thereby providing compression. Pixel information of a block undergoing compression, such as the block 18, is compared with pixel information of the previous frame 16. Specifically, the previous frame 16 is searched for a predictor region 22 that contains a substantial amount of pixel data of the block 18. Once a suitable predictor region is found, it is identified by a motion vector, which indicates the position of the predictor region 22 relative to the current block 18. That is, the data of the block 18 comprises a motion vector indicating the location of the predictor region 22 in the previous frame 16, as well as any data representing variations in pixel information from the predictor region 22. Thus, the block 18 of the current frame 14 can be defined in terms of its displacement from the previous frame 16.
There are numerous ways to search the previous frame, or reference frame, for a suitable predictor block. These range from slow pixel-based full search methods to sampling fast motion estimation methods. FIG. 2 illustrates the well-known three-step fast motion estimation method. A region 30 represents the area of the reference frame to be searched, in which nodes of the grid represent predetermined search regions. A search region may be of any size, from a single pixel to a large array of pixels, and is typically the same size as the block to be compressed, which is also typically spatially coincident with the center region R1. According to the first step of the three-step method, pixel information of the block to be compressed, the current block, is compared with the pixel information of the search regions R1–R9. Values of a cost function defining the correlation of pixel information are determined for each region R1–R9, and the region having the lowest cost function (region R4 in FIG. 2) is selected for the second step. In step two, search regions S2–S9 surrounding region R4 are compared with the pixel information of the current block, and again a region having the lowest cost function (region S7 in this case) is selected for the third and final step. Finally, the third step compares the current block with search regions T2–T9, of which the region having the lowest cost function (region T2) is selected as the region that best matches current block. The result of the third step is location of the predictor region for the current block being compressed, and such result can be identified by a motion vector (for region T2 this would be a vector (1,3)) in the reference frame.
Typically, the cost function is a sum of absolute differences (SAD) or a sum of squared differences (SSD) as expressed by the following functions:
            SAD      ⁡              (                  u          ,          v                )              =                  ∑                  i          =          1                N            ⁢                        ∑                      j            =            1                    M                ⁢                                                                                          P                  curr                                ⁡                                  (                                      i                    ,                    j                                    )                                            -                                                P                  ref                                ⁡                                  (                                                            i                      +                      u                                        ,                                          j                      +                      v                                                        )                                                                          ⁢          ¶                      ¶  ¶            SSD      ⁡              (                  u          ,          v                )              =                  ∑                  i          =          1                N            ⁢                        ∑                      j            =            1                    M                ⁢                                            [                                                                    P                    curr                                    ⁡                                      (                                          i                      ,                      j                                        )                                                  -                                                      P                    ref                                    ⁡                                      (                                                                  i                        +                        u                                            ,                                              j                        +                        v                                                              )                                                              ]                        2                    ⁢          ¶                    
where,
Pcurr is pixel data, such as a luminance_value, of the current block being compressed;_Pref is pixel data of a search region in the reference frame (such as regions R1–R9, S2–S9, T2–T9 of FIG. 2);                i, j are horizontal and vertical indices of the pixels of the current block and search regions;        
N, M are the horizontal and vertical sizes of the current block and search regions; and u, v are horizontal and vertical offsets of the search regions of the reference frame with respect to the location current block in the current frame (for region R5, u=−4 and v=0).
Although not without its benefits, the three-step method can suffer from a local minimum deficiency. For instance, in the example of FIG. 2, although in the first step the region R4 was found to best correlate with the current block, a region X1 may actually provide the lowest overall cost function. In detail, given the search region 30, suppose the actual motion vector is defined by X1. Using the three-step search as an exemplary motion estimation method, during the first step, the three-step process may determine the candidate with the smallest cost function to be R4. Following the direction of R4, the three-step search may ultimately lead to an incorrect motion vector such as that defined by T2. This is because, once a region is selected in the first step, the result of the three-step search is limited to proximate regions. The three-step method provides no contingency for this effect, that is, after the region R4 is selected in the first step, the most suitable region X1 cannot be arrived at during the second and third steps. This local minimum deficiency can be found in other state-of-the-art methods as well.