The increasing development of digital video technology presents an ever increasing problem of reducing the high cost of video compression codecs (coder/decoder) and resolving the inter-operability of equipment of different manufacturers. To achieve these goals, the Moving Picture Experts Group (MPEG) created the Moving Picture Experts Group (MPEG) created the ISO/IEC international Standards 11172 (1991) (generally referred to as MPEG-1 format) and 13818 (1995) (generally referred to as MPEG-2 format), which are incorporated herein in their entirety by reference. One goal of these standards is to establish a standard coding/decoding strategy with sufficient flexibility to accommodate a plurality of different applications and services such as desktop video publishing, video telephone, video conferencing, digital storage media and television broadcast.
Although the MPEG standards specify a general coding methodology and syntax for generating a MPEG compliant bitstream, many variations are permitted in the values assigned to many of the parameters, thereby supporting a broad range of applications and interoperability. In effect, MPEG does not define a specific algorithm needed to produce a valid bitstream. Furthermore, MPEG encoder designers are accorded great flexibility in developing and implementing their own MPEG-specific algorithms in areas such as image pre-processing, motion estimation, coding mode decisions, scalability, and rate control. This flexibility fosters development and implementation of different MPEG-specific algorithms, thereby resulting in product differentiation in the marketplace. However, a common goal of MPEG encoder designers is to minimize subjective distortion for a prescribed bit rate and operating delay constraint.
In the area of motion estimation, MPEG does not define a specific algorithm for calculating motion vectors for each picture. An image sequence, such as a video image sequence, typically comprises a group of pictures or frames. Each picture may contain up to a megabyte of information. Thus, the transmission and/or storage of such video image sequence requires an enormous amount of storage capacity or transmission bandwidth. To reduce the amount of information that is stored or transmitted, a compression technique known as motion estimation is used to compress the image by removing the redundant information. A motion vector is a two-dimensional vector which is calculated to provide an offset from the coordinate position of a block in the current picture to the coordinates in a reference frame. Because of the high redundancy that exists between the consecutive frames of a video image sequence, a current frame can be reconstructed from a previous reference frame and the difference between the current and previous frames by using the motion information (motion vectors). The use of motion vectors greatly enhances image compression by reducing the amount of information that is transmitted on a channel because only the changes within the current frame are coded and transmitted. Various methods are currently available to an encoder designer for implementing motion estimation.
Generally, motion vectors are calculated for fixed size blocks. That is, the block-matching motion estimation method partitions a picture into a plurality of blocks having a fixed size such as eight (8) pixels by eight (8) pixels and estimates the displacements (motion vectors) for the moving blocks. A motion vector is generated for each block after a search is conducted to "best" match the movement of each block from a previous reference picture. However, large block-sizes generally produce a poor motion estimation, thereby producing a large motion-compensated frame difference (error signal). Conversely, small block-sizes generally produce an excellent motion estimation at the cost of increased computational complexity and the overhead of transmitting the increased number of motion vectors to a receiver. In summary, the bit saving due to a more accurate motion estimation is generally offset by the overhead required for sending the extra motion vectors.
Thus, the balance between high motion-vector overhead and good motion estimation is the focus of a variable block-size motion-estimation method. The goal is to determine what is the optimal block-size for a picture or portion of a picture, e.g., when to apply smaller block-sizes.
FIG. 1 depicts one method of motion estimation where arbitrary variable block-sizes and locations are used to partition a picture. Although the use of arbitrary variable block-sizes may produce a very accurate motion estimation, the computational overhead of transmitting the structure of this partition is very expensive, e.g., the bit cost in describing the location and size for every block is very high.
Furthermore, FIG. 1 illustrates a second problem which is the determination of the optimal block size for a given picture. One method is to conduct an exhaustive search where every available structure is analyzed for a given "depth" or number of allowable block sizes. In a "quadtree" structure (discussed below), it can be shown that the number of searches needed is the same as the number of all possible subtrees, which is given by the inductive relation: EQU C.sub.f (d)=1+(C.sub.f (d-1)).sup.4 (1)
where d is the maximum depth of the tree and C.sub.f (d) is the number of all trees that has the depth less than or equal to d. Approximating equation (1) by C.sub.f (d).apprxeq.(C.sub.f (d-1)).sup.4, it can be shown that C.sub.f (d).apprxeq.2.sup.4.spsp.d-1. This suggests that the computation for the exhaustive search for a quadtree having a depth of five (5) is approximately C.sub.f (d).apprxeq.8.9*10.sup.307. This computational overhead is impractical for many implementations.
Therefore, a need exists in the art for an apparatus and method for reducing the computational overhead in determining motion vectors for quadtree-based variable block size motion estimation.