An image sequence, such as a video image sequence, typically includes a sequence of image frames or pictures. The reproduction of video containing moving objects typically requires a frame speed of thirty image frames per second, with each frame possibly containing in excess of a megabyte of information. Consequently, transmitting or storing such image sequences requires a large amount of either transmission bandwidth or storage capacity. To reduce the necessary transmission bandwidth or storage capacity, the frame sequence is compressed such that redundant information within the sequence is not stored or transmitted. Television, video conferencing and CD-ROM archiving are examples of applications which can benefit from efficient video sequence encoding.
Generally, to encode an image sequence, information concerning the motion of objects in a scene from one frame to the next plays an important role in the encoding process. Because of the high redundancy that exists between consecutive frames within most image sequences, substantial data compression can be achieved using a technique known as motion estimation/compensation. In brief, the encoder only encodes the differences relative to areas that are shifted with respect to the areas coded. Namely, motion estimation is a process of determining the direction and magnitude of motion (motion vectors) for an area (e.g., a block or macroblock) in the current frame relative to one or more reference frames. Whereas, motion compensation is a process of using the motion vectors to generate a prediction (predicted image) of the current frame. The difference between the current frame and the predicted frame results in a residual signal (error signal), which contains substantially less information than the current frame itself. Thus, a significant saving in coding bits is realized by encoding and transmitting only the residual signal and the corresponding motion vectors.
However, encoder designers must address the dichotomy of attempting to increase the precision of the motion estimation process to minimize the residual signal (i.e., reducing coding bits) or accepting a lower level of precision in the motion estimation process to minimize the computational overhead. Namely, determining the motion vectors from the frame sequence requires intensive searching between frames to determine the motion information. A more intensive search will generate a more precise set of motion vectors at the expense of more computational cycles.
To illustrate, some systems determine motion information using a so-called block based approach. In a simple block based approach, the current frame is divided into a number of blocks of pixels (referred to hereinafter as the "current blocks"). For each of these current blocks, a search is performed within a selected search area in the preceding frame for a block of pixels that "best" matches the current block. This search is typically accomplished by repetitively comparing a selected current block to similarly sized blocks of pixels in the selected search area of the preceding frame. However, the determination of motion vectors by this exhaustive search approach is computationally intensive, especially where the search area is particularly large.
Alternatively, other motion estimation methods incorporate the concept of hierarchical motion estimation (HME), where an image is decomposed into a multiresolution framework, i.e., a pyramid. A hierarchical motion vector search is then performed, where the search proceeds from the lowest resolution to the highest resolution of the pyramid. Although HME has been demonstrated to be a fast and effective motion estimation method, the generation of the pyramid still incurs a significant amount of computational cycles.
Furthermore, the above motion estimation methods are not easily scalable. Namely, the architecture of these motion estimation methods do not provide a user or an encoder with the flexibility to scale or switch to a different architecture to account for available computational resources and/or user's choices.
Therefore, there is a need in the art for an apparatus and a concomitant method for a hierarchical block-based motion estimation with a high degree of scalability.