1. Field of the Invention
The present invention relates to systems for generating motion vectors and has been developed with particular attention paid to its possible application in the framework of techniques for encoding digital video signals.
2. Description of the Related Art
Techniques for coding digital video signals aim at minimizing the memory occupied and/or the bandwidth required for storage/transmission of video sequences. The techniques reduce the temporal and spatial redundancy of the images (frames) that make up the sequence. Temporal redundancy is due to the correlation existing between successive frames of the sequence, whilst spatial redundancy is due to the correlation existing between samples (pixels or pels) of each image of the sequence.
Temporal redundancy is minimized by resorting to techniques of motion estimation based upon the hypothesis that each frame may be expressed locally as a translation of a preceding and or subsequent frame in the sequence.
Spatial redundancy is, instead, minimized by low-pass filtering and entropic encoding.
The diagram of FIG. 1 illustrates, in the form of a block diagram, the generic structure of a video encoder, designated, as a whole, by 10 and designed to produce, starting from the frames F at input, an encoded sequence ES at output.
The first block on the top left is the motion estimator, designated by 12, which has the task of minimizing the temporal redundancy between the current frame and the previous or subsequent frames stored in a frame-buffer memory designated by 14. The motion-estimation block 12 is followed by a block 16, which performs a transformation or transform designed to transfer the values of the pixels into the frequency domain. The values thus obtained are then quantized in a quantizer block designated by 18 to obtain a low-pass filtering effect, and the result is finally encoded on the basis of a variable-length code (VLC) into an encoding block, designated by 20.
The quantization step is calculated by the rate-controller block 22 according to the degree of occupation that it is desired to achieve in an output buffer memory designated by 24 and designed to supply the encoded sequence at output.
The quantized values are then subjected to an inverse quantization, carried out in a block designated by 26, followed by an inverse transform performed in a block designated by 28, the aim being to enable storage in the buffer memory 14, not of the original frames, but of the ones on which the video encoder is to operate during the decoding step.
As has already been said, motion estimation is the tool that enables elimination of the temporal redundancy between successive frames in a video sequence during the process of encoding of the digital video signal. This is done by dividing each frame into regions of luminance pixels (referred to as macroblocks), then expressing each macroblock as a difference with respect to a similar region in the preceding and/or subsequent frames by means of a displacement vector (or motion vector) associated with a prediction error given by the difference between the current macroblock of the frame and the region to which the motion vector points (the motion vector being known as “predictor”). In order to minimize the bit rate of the encoded video sequence ES, the prediction error (also referred to as “estimation error” or “matching error”) must be as small as possible. This can be evaluated using, for example, mean square error (MSE), or else by the sum of the absolute differences (SAD).
For example, if the macroblock is a square region consisting of 16×16 pixels, the SAD is defined as described as follows.
Suppose that SAD (x, y) is the sum of the absolute differences between a macroblock in the position (x, y) in the n-th reference frame, with pixels of intensity Vn(x+i, y+j), and a corresponding macroblock in the position (x+dx, y+dy) in the m-th frame, which has pixels of intensity Vm(x+dx+i, y+dy+j); thenSAD(x,y)=Σi=015Σj=015|Vn(x+i,y+j)?Vm(x+dx+i,y+dy+j)|
A motion estimator of a generic type operates in the following way.
In the first place, it receives the data of the current macroblock CMB, generating the motion vectors to be examined for the current macroblock.
It fetches the data for the previous frame and/or the subsequent frames to which the motion vectors point, then aligns them and performs, if necessary, an interpolation of a sub-pixel type, thus constructing the predictor for each motion vector.
The estimator then calculates the prediction error for each motion vector, calculating the estimation error between the data of the current macroblock CMB and the predictor. After checking all the motion vectors, it chooses the one or the ones with the lowest estimation error, issuing it or them at output together with the associated predictor.
The motion-estimation function is a highly repetitive task and one with a high computational intensity. This explains why, in a digital video encoder, this function is performed usually by a dedicated co-processor, referred to as motion estimator.
A co-processor of this sort in general has a structure of the type of the one represented in FIG. 2 and comprising two main blocks.
The first block, which constitutes the motion-vector generator designated by 30, generates the motion vectors MV that are to undergo testing on the basis of the chosen motion-estimation algorithm.
The second block, designated by 32, is basically an engine for calculating the estimation error, which is to calculate the estimation error between the information regarding the current macroblock CMB and the predictor P to which the motion vector being tested points.
Once again in the diagram of FIG. 2, there is visible, at output from the engine for calculating estimation error 32, the line on which there is or are available the motion vector or vectors considered as winners, following upon the test; the said winning motion vectors, designated as WMV, are to function as new predictors P. The feedback information on the estimation errors ME and the motion vectors MV is sent back by the engine 32 to the generator 30 by means of the line designated by ME, MV.
In current hardware implementations, the first block 30 is usually built resorting to a solution of the hard-wired type.
Instead, as regards the motion-vector generator 32, it is possible to choose between a hardware implementation and a software implementation.
In the first case, the motion-vector generator 30 also has a hard-wired configuration which ensures high efficiency, a reduced area of occupation of silicon, and the possibility of functioning with a low power absorption. The main drawback is represented by the fact that this solution does not provide any margin of flexibility.
In the case of an implementation purely at a software level, the motion-estimation algorithm is executed on a dedicated CPU core which can be formed, for example, by a digital processor of the DSP type. This choice ensures the maximum level of flexibility, given that the CPU is completely programmable. The drawback of this solution is that it may prove rather slow and affected by a high power absorption if compared to a hardware solution.