1. Field of the Invention
The present invention relates to image processing, and, in particular, to motion estimation for frame and field picture coding.
2. Description of the Related Art
An image sequence, such as a video image sequence, typically includes a sequence of image frames. The reproduction of video containing moving objects typically requires a frame speed of thirty image frames per second, with each frame possibly containing in excess of a megabyte of information. Consequently, transmitting or storing such image sequences requires a large amount of either transmission bandwidth or storage capacity. To reduce the necessary transmission bandwidth or storage capacity, a frame sequence may be compressed such that redundant information within the sequence is not stored or transmitted. As such, image sequence compression through various encoding techniques has been the subject of a great deal of research in recent years. Digital television, video conferencing, and CD-ROM archiving are applications that can benefit from video sequence encoding.
Generally, to encode an image sequence, information concerning the motion of objects in a scene from one frame to the next plays an important role in the encoding process. Because of the high redundancy that exists between consecutive frames within most image sequences, substantial data compression can be achieved using a technique known as motion estimation. In a sequence containing motion from one frame to the next in an image sequence, a current frame can be reconstructed using a preceding frame and motion information representing the motion-compensated difference between the current frame and the preceding frame. For example, in a video transmission system, at the transmitter, a current frame is compared to a preceding frame to determine motion information. Thereafter, the transmitter transmits the preceding frame and the motion information to a receiver. At the receiver, the current frame is reconstructed by combining the preceding frame with the motion information. Consequently, only one frame and the motion information is transmitted and received rather than two entire frames. In applications such as video conferencing, video telephone, and digital television, motion information has become the key to data compression. However, extraction of the motion information from the frame sequence is itself computationally intensive, placing a heavy burden on the hardware designed to perform the motion estimation task.
Many systems determine motion information using a so-called block-based approach. In a simple block-based approach, the current frame is divided into a number of blocks of pixels (referred to as current blocks). For each of these current blocks, a search is performed within a selected search area in a reference frame (e.g., corresponding to a preceding frame in the image sequence) for a block of pixels that "best" matches the current block. This search is typically accomplished by repetitively comparing a selected current block to each similarly sized block of pixels in the selected search area of the reference frame. Once a block match is found, the location of the matching block in the search area in the reference frame relative to the location of the current block within the current frame defines a motion vector. This approach, i.e., comparing a current block to each similarly sized block in a selected search area in the reference frame, is known as the full-search or exhaustive-search approach. The determination of motion vectors by the exhaustive-search approach is computationally intensive. As such, these systems tend to be relatively slow in processing the frames and expensive to fabricate.
In conventional video sequences, each frame of image data consists of two interleaved fields--a top field corresponding to the odd rows in the frame and a bottom field corresponding to the even rows in the frame, where the top and bottom fields are generated at slightly different times. Depending on the encoding scheme, image data can be encoded based on the frames (i.e., frame picture coding) or based on the individual fields (i.e., field picture coding). Under the Motion Picture Experts Group (MPEG)-2 video compression standard, motion estimation processing for frame picture coding is different from motion estimation processing for field picture coding.
FIG. 1 illustrates the different sets of image data used for motion estimation processing for MPEG-2 frame picture coding. In frame picture coding, the current frame (i.e., the frame currently being encoded) is divided into blocks (e.g., 16 pixels by 16 pixels, although other block sizes are also possible). Each (16.times.16) block in the current frame corresponds to a (16.times.8) sub-block in the top field used to form the current frame and a (16.times.8) sub-block in the bottom field used to form the current frame. In MPEG-2 frame picture coding, motion estimation processing is performed five times for each frame, each time comparing different sets of data from the current frame to different sets of data from the reference frame as follows:
(1) Comparing (16.times.16) blocks in the current frame to search areas in the reference frame to generate a set of frame vectors, where, for each current block, the search area is, for example, a (32.times.32) region of pixels in the reference frame centered at the corresponding location of the current block in the current frame;
(2) Comparing (16.times.8) sub-blocks in the top field for the current frame (referred to in FIG. 1 as the current top field) to search areas in the top field for the reference frame (referred to in FIG. 1 as the reference top field) to generate a set of top-top field vectors;
(3) Comparing (16.times.8) subblocks in the current top field to search areas in the bottom field for the reference frame (referred to in FIG. 1 as the reference bottom field) to generate a set of bottom-top field vectors,
(4) Comparing (16.times.8) sub-blocks in the bottom field for the current frame (referred to in FIG. 1 as the current bottom field) to search areas in the reference top field to generate a set of top-bottom field vectors; and
(5) Comparing (16.times.8) sub-blocks in the current bottom field to search areas in the reference bottom field to generate a set of bottom-bottom field vectors.
The (32.times.32) search region for motion estimation in the reference frame is based on a search range of 8 pixels in each of the positive and negative X (or width) and Y (or height) directions from the corresponding position of the (16.times.16) block in the current frame. Since the sub-blocks are only 8 pixels high, the search areas in the reference top and bottom fields for the same 8-pixel search range are only (32.times.24). Of course, search ranges and search areas of other dimensions can also be used.
FIG. 2 illustrates the different sets of image data used for motion estimation processing for MPEG-2 field picture coding, where, for the example in FIG. 2, the current picture is a top field. In field picture coding, image data in the current field (i.e., the field currently being encoded, whether a top field or a bottom field) is compared with image data in two different reference fields: a reference top field and a reference bottom field. Each block in the current field (e.g., 16.times.16 block 202, although other block sizes are also possible) is further divided into an upper sub-block (e.g., 16.times.8 sub-block 204) and a lower sub-block (e.g., 16.times.8 sub-block 206). According to MPEG-2 field picture coding, motion estimation processing is performed six times for each field, each time comparing blocks of data from the current field to different sets of data from the reference fields as follows:
(1) Comparing (16.times.16) blocks in the current field to search areas (e.g., 32.times.32) in the reference top field to generate a set of top-current vectors;
(2) Comparing (16.times.16) blocks in the current field to search areas (e.g., 32.times.32) in the reference bottom field to generate a set of bottom-current vectors;
(3) Comparing (16.times.8) upper sub-blocks in the current field to search areas (e.g., 32.times.24) in the reference top field to generate a set of top-upper vectors;
(4) Comparing (16.times.8) upper sub-blocks in the current field to search areas (e.g., 32.times.24) in the reference bottom field to generate a set of bottom-upper vectors;
(5) Comparing (16.times.8) lower sub-blocks in the current field to search areas (e.g., 32.times.24) in the reference top field to generate a set of top-lower vectors; and
(6) Comparing (16.times.8) lower sub-blocks in the current field to search areas (e.g., 32.times.24) in the reference bottom field to generate a set of bottom-lower vectors.
Here, too, the sizes of the blocks and sub-blocks in the current field as well as the search ranges and search areas in the reference fields can be different from those shown in FIG. 2.
Motion estimation processing can be computationally demanding for an image encoding system. Having to repeat motion estimation processing multiple times for each picture in a video sequence, whether it is five times for frame picture coding or six times for field picture coding, can greatly reduce the speed at which the video data can be encoded in a system having finite processing capacity.