Conventional video compression algorithms utilize the temporal correlation amongst frames of pictures to remove redundancy there between. For example, a previous (reference) a frame is used to predict a current frame to be encoded. The prediction operation must account for the motion of objects in a frame in order to enable motion compensation, whereby a matchblock is taken from a current frame and a spatial offset in the reference frame is determined to represent a good prediction of where the matchblock of the current frame can be found. This offset is known as the motion vector. To determine the spatial offset, some form of difference is taken between a search area of the reference frame and a matchblock of the current frame to generate the prediction data.
In general, motion compensation involves the use of the motion vector to extract the predicting block from the reference frame, to subtract it therefrom, and to use the resulting difference for further compression. Motion estimation involves ascertaining the best motion vector to be used in predicting the matchblock from the current frame. In the typical model used for motion estimation, an assumption is made that the image in the current frame is a translation of the image in the reference frame. It can be also assumed that there is virtually no change in the image content amongst frames, or that such a change will be compensated by known techniques not specifically discussed herein. Because the image of the current frame is a translation of the image of the reference frame, a block of pixels in the current frame must reside somewhere within a searchable area in the reference frame.
One problem associated with motion estimation is that it is an expensive computing activity in the encoding process, i.e., typically associated with calculation intensive steps. For example, in FIG. 9, for each match, exhaustively searching a (64×64 pixels) search area 222 for a (16×16 pixels) matchblock 220, involves undertaking: 256 comparisons (differences); 256 calculations involving computing the absolute values of the differences (referenced subsequently as “absolute value calculations” for convenience and brevity) between the matchblock and a search area; and 255 additions of the 256 values. For 64 matches, the total computations involved increases drastically to 64·64·256+64·64·256+64·64+255. Because motion estimation conventionally requires an exhaustive search of a large area of the matchblock, it can thus be appreciated that a need exists for a technique to reduce the number of computations required to determine the motion vector information. It is desirable to reduce the large number of calculations traditionally involved to determine the motion vector.
In the past, approaches aimed at reducing the large number of calculations associated with motion estimation have generally proven unsatisfactory because of the degradation in resolution quality. For example, to reduce the large number of calculations, one conventional approach increases the size of the matchblock. This increases the computations per comparison; A however, there will be fewer blocks per frame, which leads to a lower number of times motion compensation is undertaken. While this approach results in fewer blocks per frame, it is problematic because it can lead to poor predictions, especially when the probability that a block will contain objects moving in different directions increases with size. Another approach attempts to reduce the search area size in order to reduce the number of computations to search for a match. Yet, this approach has a drawback of missing the match because the search area is small. Accordingly, what is needed is a way to reduce the number of motion estimation calculations while keeping a high quality of the resulting predictions. It is desirable if the motion estimation calculations could improve the resolution quality of the resulting predictions.
When the video compression techniques and associated hardware are applied in the field of portable multimedia devices, other considerations arise. To be practical and cost-effective, the architecture hardware must not only be able to handle a large amount of data traffic as a result of the intensive calculations associated with video processing, but also be compact and uncomplicated. The large amount of data traffic is a result of loading video images from storage to a processor, and returning the processed images back to storage, only to load more video images. Conventional approaches use a 128 bit bus to handle the large amount of data traffic, however, this data bandwidth is too large for the minimal placement and routing requirements of VLSI, ASIC and System-on-Chip (SoC) applications being used in increasing smaller and more streamlined multimedia-based devices and appliances. As many hand-held portable devices equipped to handle multimedia video formats continue to become increasingly smaller in size, it would be ideal if the video compression hardware were streamlined, that is, with logic designed so that the routing, placement, and layout of logic and circuit components are compact, have an uncomplicated design, and yet are enabled to accommodate complex calculations.