1. Field of Invention
The present invention relates to a motion estimation circuit, and particularly to a motion estimation circuit (ME circuit) and a motion estimation processing element (ME processing element), which combines both the advantage of a systolic array architecture featuring high efficiency of data reusing and the advantage of an adder-tree architecture featuring capability of simultaneously processing multi-point data in a clock cycle, so that high-efficient motion estimation operations are achieved.
2. Description of the Related Art
Conventional motion estimation circuit (ME circuit) architectures can be mainly categorized into an adder-tree architecture and a systolic array architecture. The architecture based on an adder-tree is mostly used to implement a three-step searching algorithm, a four-step searching algorithm, a diamond searching algorithm or other non-full searching algorithms. The hardware configuration of an architecture based on adder-tree features that a plurality of processing elements (PEs) is used to perform a parallel processing on the data required by an individual candidate motion vector (MV). However, the adder-tree architecture fails to simultaneously process a plurality of candidate MVs and thus the efficiency of data reusing is very low.
The systolic array architecture usually accomplish a full search algorithm or a hierarchy search algorithm. The architecture mainly features that it is able to simultaneously perform processing on a plurality of candidate MVs and uses the pipeline characteristic thereof for advancing the efficiency of data reusing and for reducing the bandwidth required by a data bus. Though a processing element in a conventional systolic array is able to compare two pixels in one clock cycle, it fails to perform a mapping processing on 16-point data or 32-point data simultaneously, which can be performed by the adder-tree architecture.