The invention relates generally to video compression and, more specifically, to reducing the complexity of motion estimation.
Image information, for example, motion picture information, video information, animated graphics information, etc., often involves large amounts of data. Image information often includes data to represent coloring, shading, texturing, and transparency of every pixel in every frame of the image information. Each frame may include hundreds of thousands or even millions of pixels. The frames are often presented in rapid succession, for example, at a rate of 30 frames per second, to convey a sense of motion. Thus, without some form of data compression, such image information could easily involve hundreds of millions of bits per second.
To avoid the need to store and transmit such large amounts of data, data compression techniques have been developed. Some data compression techniques have been designed to compression image information. Video compression converts a sequence of digital video frames, each comprising an array of pixel values, into a smaller (in terms of number of bits used for storage and transmission) and more efficient representation.
One example is the type of encoding specified by the Moving Picture Experts Group (MPEG). MPEG encoding is widely used, and its uses include internet video transmission, DVD (digital video disks), and HDTV (high-definition television). MPEG encoding produces a stream of different types of frames. These frames include intra frames and non-intra frames. The intra frames include sufficient information to reconstruct a frame of unencoded image information without the need to reference other frames of encoded data. The non-intra frames, however, provide information that refers to other information encoded in intra frames or other non-intra frames. Unencoded frames represented by the non-intra frames may be reconstructed by applying the information contained in the non-intra frames to the information contained in the intra frames to which the non-intra frames refer.
Since the amount of information stored in non-intra frames is much smaller than the amount of information in the unencoded frames that the non-intra frames represent, the use of non-intra frames can help greatly reduce the amount of image information that needs to be stored or transmitted. One aspect of the non-intra frames that allows them to contain less information than the unencoded frames they represent is that the non-intra frames essentially recycle image information found in the intra frames. For example, an unencoded frame represented by an intra frame may depict several objects. The objects may be located in several areas of the frame. Since the advantage of moving images, such as motion pictures, video, and animated graphics, over still images is the ability of the moving images to convey a sense of motion, the objects located in several areas of the frame often move to different areas when they appear in subsequent frames.
Since the image information needed to represent the appearance of the objects is present in the intra frames, that information may be recycled in non-intra frames. The non-intra frames contain the information needed to update the location of the objects without having to contain all of the information needed to express the appearance of the objects. Therefore, to encode non-intra frames, the change in the position of the objects represented in the non-intra frames is determined relative to the position of the objects represented in the intra frames.
One technique that has been used to determine the change in the position of objects represented in non-intra frames involves dividing an image into image blocks (i.e., square blocks of pixels within the image). Then a determination is made as to where the block of pixels was located in the previous frame. This process is done by matching the pattern one block at a time.
In greater detail, the process includes several steps. First, the current (non-intra, or predicted) frame is divided into blocks. Then, for each block, a collection of potentially matching blocks is defined in the reference (intra or predicted) frame. Then, all of the pixel differences are added up using absolute differences to determine a score. The matching block that has the best score is then selected.
A motion vector is used to represent the change from the location of the matching block in the reference frame with the best score to the location of the block in the current frame. Motion estimation is a process for identifying the optimal values for motion vectors. Motion estimation is typically performed by considering many motion vectors in a search window and evaluating the quality of the matches represented by the motion vectors. Computing the quality of the matches to determine the best match typically requires a very large number of numerical operations. At the culmination of the motion estimation process, the motion vector is used to encode one or more non-intra frames for a sequence of images. The information encoded in the non-intra frames may then be used for motion compensation to reconstruct the sequence of images when the non-intra frames are decoded.
Since motion estimation is the most computationally taxing stage of typical video compression techniques, such as MPEG video compression, reducing the complexity of the motion estimation process reduces the processing capability needed to perform video compression, allows the video compression to be performed more quickly, and allows a greater amount of video information to be compressed in a given amount of time.
One approach to reducing the complexity of numerical operations for motion estimation is disclosed in U.S. Pat. No. 5,712,799, issued to Farmwald et al. and owned by the assignee of the present application. Farmwald et al. disclose a method and structure for performing motion estimation using reduced precision pixel intensity values. The pixel intensity values of the pixels in the current block are averaged to determine a first average pixel intensity value. The pixel intensity values of the current block which have a pixel intensity value less than the first average pixel intensity value are averaged to determine a second average pixel intensity value. The pixel intensity values of the current block which have a pixel intensity value greater than the first average pixel intensity value are averaged to determine a third average pixel intensity value. The first, second, and third average pixel intensity values are used to determine thresholded pixel intensity values for the current block pixels. and the search window pixels, thereby creating a thresholded current block and a thresholded search window.
Farmwald et al. provide a particular technique for reducing the precision of pixel intensity values, for example, to allow pixel intensity values to be expressed as two-bit representations rather than eight-bit values. By operating on the two-bit representations rather than on the eight-bit values, the complexity of certain block matching computations used for motion estimation is significantly reduced.
However, the teachings of Farmwald et al. do not guarantee that the best possible block match score estimate will be obtained. Thus, less than optimal motion vectors may be selected and the quality of the compressed video produced according to the motion estimation technique of Farmwald et al. may be compromised. Thus, a method and apparatus that reduce the complexity of the motion estimation process without compromising quality are needed.