The inventive concept relates to a motion estimation device and a video encoding device including the same.
Currently, the demand for a high-quality video service is increasing due to the development of the information communications technology including the Internet and the development of various multimedia devices such as high-definition televisions (HDTVs), personal digital assistants (PDAs), and mobile phones. Thus, in order to process high-quality video data, technology for efficiently compressing large quantities of image data is required.
Motion estimation technology is widely used in video compression standards such as MPEG and H. 26x standards. Such motion estimation technology involves calculating a motion vector representing a displacement between a current image and a previous image, a displacement which is caused, for example, by a movement of an object in a video image, camera movement, image magnification, or image reduction, and is widely used in video processes such as video compression for size reduction, pixel demosaicing, and frame interpolation.
In order to calculate a motion vector, an initial determination needs to be made as to whether to perform motion estimation in pixel units or block units. In general, motion estimation is performed in block units using a video compression operation. A block matching algorithm is an algorithm in which a motion vector between a current frame and a previous frame is estimated in block units. In the block matching algorithm, a macroblock of the current frame is compared to a macroblock of the previous frame, in a determined search area of the previous frame, and then a location of the most similar macroblock is detected. That is, it is detected which location of the current frame the macroblock of the previous frame moves to. In this case, a location and size of the movement correspond to the motion vector.
A large number of motion estimation algorithms exist for calculating the motion vector. An example of the motion estimation algorithms is a full search block matching (FSBM) algorithm that is a kind of block matching algorithm. In the FSBM algorithm, all pixels in the macroblock of the current frame are compared to all macroblocks in a search area of the previous frame, which is set to have a predetermined size, and a macroblock having the minimum difference value is detected from a plurality of candidate macroblocks included in the search area. Thus, the FSBM algorithm generates a quite accurate block-based motion vector, as a result.
Meanwhile, another example of the motion estimation algorithms is a coarse-to-fine search algorithm. The coarse-to-fine search algorithm has a relatively low accuracy but has a reduced amount of operations, in comparison to the FSBM algorithm. Examples of the coarse-to-fine search algorithm are a 3-step search algorithm, a 4-step search algorithm, and a 2-dimensional (2-D) logarithmic search algorithm. This coarse-to-fine search algorithm provides a fast speed but has a problem of low image quality. Thus, in spite of a large amount of operations, the FSBM algorithm is widely used due to its simple architecture and high image quality.
A large number of motion estimation hardware technologies for implementing the FSBM algorithm exist, and these motion estimation hardware technologies may be classified into a plurality of categories including a systolic array architecture and a tree architecture.
The systolic array architecture, as motion estimation hardware, is appropriate for implementing a very large scale integration (VLSI) of a block matching algorithm and has an advantage in that reference data (for example, pixel data of the previous frame) is provided to a plurality of on-chip processing elements by performing a shift operation and thus the reuse of data is maximized. Thus, a data input bandwidth may decrease. However, a data path length through which the reference data is transmitted due to the shift operation is long and thus a delay may be increased or a problem of internal data skew may occur.
In order to avoid the above-described long delay or data skew, a tree architecture can be adopted. The tree architecture has an advantage of optimizing execution latency and is more appropriate for implementing the 3-step search algorithm. However, since the data skew does not occur in the tree architecture, data related to the candidate blocks (found macroblocks) in the search area of the previous frame have to be accessed at the same time so as to be provided to the tree architecture. Thus, the tree architecture requires a larger data input bandwidth in comparison to the systolic array architecture.
In order to provide a tradeoff between the characteristics of the systolic array architecture and the tree architecture, a hybrid tree and linear array architecture is suggested. The hybrid tree and linear array architecture is a combination of the systolic array architecture and the tree architecture. Latency and an input data bandwidth characteristic may be controlled by selecting an appropriate size of a systolic array or a sub-tree included in the hybrid tree and linear array architecture.
FIG. 1 is a block diagram of a general systolic array architecture for implementing an FSBM motion estimation algorithm. The systolic array architecture includes a one-dimensional (1-D) systolic array architecture and a 2-D systolic array architecture. The systolic array architecture illustrated in FIG. 1 is the 2-D systolic array architecture.
As illustrated in FIG. 1, if a macroblock, which is a basic processing unit of a block matching algorithm, includes Nh (the number of horizontal pixels)×Nv (the number of vertical pixels) pixels, the 2-D systolic array architecture may include Nh×Nv processing elements PE which are arranged in horizontal and vertical directions of a 2-D architecture. Each of the Nh×Nv processing elements may include a latch (not shown). Data of each pixel of the macroblock of a current frame may be pre-loaded to a corresponding processing element so as to be stored in a latch.
Also, pixel data (reference data) of a candidate macroblock in a search area of a previous frame are provided to the processing elements. Nv pieces of the pixel data of the candidate macroblock are accessed in parallel and may be provided to the processing elements based on a shift operation in a horizontal direction. Each processing element calculates an absolute difference between pixel data of the current frame and pixel data of the previous frame.
In order to detect a location of the most similar macroblock to the macroblock of the current frame from among candidate macroblocks in the determined search area of the previous frame, a cross-correlation function (CCF) method, a mean square error (MSE) method, a mean absolute error (MAE) method, and a sum of absolute differences (SAD) method may be used. Among the above-mentioned methods, the SAD method is the most widely used in actual implementation due to its low complexity and its excellent performance.
As illustrated in FIG. 1, an SAD is calculated by each predetermined unit A included in the 2-D systolic array architecture. According to the 2-D systolic array architecture, reference data is prevented from being repeatedly provided to a systolic array and previously input reference data can be reused. Thus, a data input bandwidth may decrease.
FIG. 2 is a block diagram of a general tree architecture for implementing an FSBM motion estimation algorithm. Each unit D illustrated in FIG. 2 compares current frame macroblock pixel data CURRENT DATA X and previous frame candidate macroblock pixel data REFERENCE DATA Y so as to calculate an absolute difference therebetween. Each unit A illustrated in FIG. 2 sums two absolute differences and outputs a summed absolute difference. Unlike a 2-D systolic array architecture, Nh×Nv pieces of reference data corresponding to all the number of pixels of a macroblock are accessed in parallel so as to be provided to the tree architecture. As illustrated in FIG. 2, the reference data provided to the tree architecture are not able to be reused and are repeatedly accessed several times. Thus, a data input bandwidth is increased.
Currently, as new display technologies for liquid crystal display devices (LCDs), plasma display panels (PDPs), and digital projection systems provide high-quality and large-scaled display services, there is an increased demand for image processing technology for processing high-resolution images. In particular, the amount of data to be processed per minute increases in accordance with high resolution configurations and with increased display size, and thus, the operation frequency increases.
However, the above-described various category methods based on an FSBM algorithm are not necessarily suitable for high-performance processing of a high-resolution image. In particular, a hybrid tree and linear array architecture that is used in order to provide a tradeoff between characteristics of a systolic array architecture and a tree architecture, requires a large amount of operations and thus is not appropriate for processing the high-resolution image.