Digital video compression may be defined as the process of transforming analog video into a digital representation at a fraction of the storage space. Such a transformation (or encoding) is carried out via the digital sampling of the analog video. After sampling, specific processes are applied to the raw digital samples in order to transform the new digital samples into a digital bitstream of fixed and variable-length codewords. These codewords are later processed by a digital video decoder. The codewords are later transformed back to analog video for presentation on a traditional television monitor.
Digital video encoding encompasses a wide range of processes which transform the analog video into a final compressed bitstream. Such a transformation includes stages such as (i) image capture and pre-processing, (ii) motion estimation (ME), (iii) discrete-cosine transformation, and (iv) the generation of fixed and variable-length codewords. In the encoding process, motion estimation is generally the most processor-intensive segment. Motion estimation involves comparing a block-based target with a reference picture. Each block in the target image is searched in the reference image for the closest match. There are varying methods for performing such a search. There are also differing systems for determining the closest match. The goal of performing motion estimation is to transfer one or more vectors pointing to a combination of one or more blocks that form a prediction error. When one or more blocks are a close match, the target block will be presented.
Referring to FIGS. 1-2, a diagram illustrating a block based motion estimation search is shown. A target image is divided into target blocks. Each target block is searched in a larger area in the reference image. The basic building block of a motion-estimator is the ability to perform sum of absolute differences (SAD). A pixel-by-pixel sum of absolute differences for each position of the target block in the reference search area is calculated. The search coordinate which produces the lowest SAD score is chosen as the final match in that particular reference area. The lowest SAD score for a particular target block may also be defined as a “motion vector score” or a “score from the motion vector”. The reference area may be as small as one target block, or as large as the entire reference image. A search offset starting at a (0, 0) offset may be positioned from the target location to anywhere else within the reference image coordinates.
A hierarchical motion estimation search program generally involves (i) sub-sampling an image in the horizontal and/or in the vertical directions and (ii) using the results of the sub-sampled image to perform a new ME image search with adjusted starting reference points (X, Y) at a correspondingly higher resolution. The combined hierarchy of searches with further refinement at each search stage results in an accurate local match between the target block and the combined reference search areas.
There are typically several challenges in designing a hierarchical motion estimation search program. The total engine cycles must be considered and compared with the available encoding CPU allocation. The SAD operation for a block can be from between 1 to n clock cycles, depending on the built-in hardware acceleration of the motion-estimation engine and the desired search area in the reference frame. In addition to the raw computational cycles, memory bandwidth associated with the loading of the target and reference areas must also be carefully evaluated.
The sub-sampling of images can only be carried to a reasonable degree. For example, sub-sampling smaller target and references frames may severely distort the quality of the image to the point that the resulting search vectors may not correctly correspond to the actual search results.
The combination of the search ranges, when scaled to the original image size, constitute an adequately large percentage of the reference image. Such a challenge must be achieved while observing the typically significant computational standards for the motion estimation process in a limited CPU performance scenario.
With advanced coding standards, multiple block-sizes can be supported for motion estimation and compensation. The adaptive choice of using multiple block sizes generally helps in reducing the prediction error. Such a reduction improves the coding efficiency of the stream. However, additional searches may be needed between the target and reference frames.
It would be desirable to implement a hierarchical motion estimation search program that complies with MPEG-4 standards or variations of MPEG-1 and/or MPEG-2 standards which include image dimensions of varying horizontal and vertical sizes.