The invention relates generally to a system for encoding image sequences and, more particularly, to an apparatus and a concomitant method that employs xe2x80x9cmulti-scale block tilingxe2x80x9d to reduce the computational complexity in determining motion vectors for block-based motion estimation and to enhance the accuracy of motion estimates.
An image sequence, such as a video image sequence, typically includes a sequence of image frames or pictures. The reproduction of video containing moving objects typically requires a frame speed of thirty image frames per second, with each frame possibly containing in excess of a megabyte of information. Consequently, transmitting or storing such image sequences requires a large amount of either transmission bandwidth or storage capacity. To reduce the necessary transmission bandwidth or storage capacity, the frame sequence is compressed such that redundant information within the sequence is not stored or transmitted. Television, video conferencing and CD-ROM archiving are examples of applications which can benefit from efficient video sequence encoding.
Generally, to encode an image sequence, information concerning the motion of objects in a scene from one frame to the next plays an important role in the encoding process. Because of the high redundancy that exists between consecutive frames within most image sequences, substantial data compression can be achieved using a technique known as motion estimation/compensation. In brief, the encoder only encodes the differences relative to areas that are shifted with respect to the areas coded. Namely, motion estimation is a process of determining the direction and magnitude of motion (motion vectors) for an area (e.g., a block or macroblock) in the current frame relative to one or more reference frames. Whereas, motion compensation is a process of using the motion vectors to generate a prediction (predicted image) of the current frame. The difference between the current frame and the predicted frame results in a residual signal (error signal), which contains substantially less information than the current frame itself. Thus, a significant saving in coding bits is realized by encoding and transmitting only the residual signal and the corresponding motion vectors.
To illustrate, in a sequence containing motion, a current frame can be reconstructed using an immediately preceding frame and the residual signal representing the difference between the current and the immediately preceding frame. The transmitter or encoder transmits the preceding frame, the residual signal and the corresponding motion vectors to a receiver. At the receiver, the current frame is reconstructed by combining the preceding frame with the residual signal and the motion information. Consequently, only one (1) frame and the difference information with its associated motion vectors are transmitted and received rather than two (2) entire frames.
However, encoder designers must address the dichotomy of attempting to increase the precision of the motion estimation process to minimize the residual signal (i.e., reducing coding bits) or accepting a lower level of precision in the motion estimation process to minimize the computational overhead. Namely, determining the motion vectors from the frame sequence requires intensive searching between frames to determine the motion information. A more intensive search will generate a more precise set of motion vectors at the expense of more computational cycles.
For examples, many systems determine motion information using a so-called block based approach. In a simple block based approach, the current frame is divided into a number of blocks of pixels (referred to hereinafter as the xe2x80x9ccurrent blocksxe2x80x9d). For each of these current blocks, a search is performed within a selected search area in the preceding frame for a block of pixels that xe2x80x9cbestxe2x80x9d matches the current block. This search is typically accomplished by repetitively comparing a selected current block to similarly sized blocks of pixels in the selected search area of the preceding frame. Once a block match is found, the location of matching block in the search area in the previous frame relative to the location of the current block within the current frame defines a motion vector. This approach, i.e., comparing each current block to an entire selected search area, is known as a full search approach or the exhaustive search approach. The determination of motion vectors by the exhaustive search approach is computationally intensive, especially where the search area is particularly large. As such, these systems tend to be relatively slow in processing the frames and expensive to fabricate.
Therefore, there is a need in the art for an apparatus and a concomitant method for reducing the computational complexity in determining motion vectors for block-based motion estimation.
The present invention is an apparatus and method that employs xe2x80x9cmulti-scale block tilingxe2x80x9d (N-scale tiling) to reduce the computational complexity in determining motion vectors for block-based motion estimation and to enhance the accuracy of motion estimation methods. More specifically, the present invention decomposes each of the image frames within an image sequence into an M-ary pyramid. Next, N-scale tiling is employed with the M-ary pyramid to effect hierarchical motion estimation. N-scale tiling is the process of performing motion estimation for a current block of the frame using xe2x80x9cNxe2x80x9d different xe2x80x9ctiling blockxe2x80x9d sizes. For example, if N is set to three, then three (3) motion vectors are generated for each block within each frame, i.e., the block is xe2x80x9ctiledxe2x80x9d with three different block sizes or scales. Thus, hierarchical motion estimation with N-scale tiling allows an encoder to discriminate between the motion of larger structures versus the motion of smaller features within the frame under consideration.