The present invention relates to special purpose image compression coprocessors and in particular to motion estimation coprocessors.
To represent a picture image digitally, the image area is described as an array of pixels. A digital number describes the color, luminance and chrominance of each pixel. Pixel color information actually consists of three digital values: one digital value for red, one digital value for green and one digital value for blue. Thus, the sheer volume of data needed to describe one single pixel means that digital representations of complete picture images result in exceptionally large data files.
In full motion video, not only are large blocks of data required to describe each individual picture image, but a new image or frame must be presented to the viewer at approximately thirty new images per second to create the illusion of motion. Moving these large streams of video data across digital networks or phone lines is simply infeasible given the available bandwidth.
Data compression is a technique for reducing the number of bits required to send a given message. Data compression utilizes either a single shorthand notation to signal a repetitive string of bits or omits data bits from the transmitted message. The latter form of compression is called "lossy" compression and capitalizes upon the ability of the human mind to provide the omitted data. In still video, the JPEG standard is used for data compression and defines the method by which the still image is to be compressed. In motion video, much of the picture data remains constant from frame to frame. Therefore, the video data may be compressed by first describing a reference frame and describing subsequent frames in terms of the change from the reference frame.
A reference frame can be used in three ways: forward prediction, backward prediction and interpolation. Forward and backward prediction use a single reference frame and describe subsequent or previous frames respectively in terms of the difference from the reference frame. Interpolation uses both forward and backward reference frames. The forward reference frame is located in the data stream at an earlier point in time than the current frame. The backward reference frame is located in the data stream at a later point in time than the reference frame. The current frame is calculated based on averaged differences between the first reference frame and the second reference frame.
Several specific protocols for implementing motion compression exist. Several of these protocols are hardware specific and developed by chip manufacturers in the absence of accepted compression standards. Recently, however, two accepted standards for motion video compression have emerged. The CCITT (International Consultative Committee on Telephone and Telegraph) H.261 video conferencing standard uses an algorithm called P.times.64. The P refers to a multiplier in the range 2 to 30 and the 64 refers to a single 64 Kbps ISDN channel for transmitting the data. However, squeezing even this compressed data over the ISDN telephone line requires drastic compression. Fortunately, the typical video conference does not have much motion from frame to frame, and P.times.64 utilizes only forward prediction over a single frame time.
To enable higher quality, full motion video, a second standard called MPEG (Motion Pictures Expert Group) has evolved. The MPEG specifications do not define the exact procedure for compressing the video. Rather, the standard defines the format and data rate of the compressed output. The set of compression tools employed by MPEG includes a JPEG-like method for compressing intraframes, various combinations of forward, backward, and interpolated motion compression, and subband coding for audio.
More particularly, operations according to the MPEG standard may be summarized with reference to the following hypothetical in which the video system wishes to describe four sequential image frames. The video processing system first receives the first frame. This first received frame cannot be described in terms of a reference frame and only intraframe (i.e. non-predictive) coding is performed.
The second frame is then received. One possible implementation of the MPEG compression standard describes this frame in terms of the first frame, or intraframe ("I" frame) and a first forward predicted ("P") frame. However, this first P frame is not yet defined and compression of the received second frame is delayed until receipt of the first P frame by the processing system. The third frame also will be described in terms of the first I and P frames.
The fourth frame of this hypothetical example is used to form the first P frame. The P frame is formed by predicting the fourth received frame using the first I frame as a reference. Upon computation of the first P frame, the motion estimation processor can process the second and third received frames as bidirectionally predicted "B" frames by comparing blocks of these frames to blocks of the first I and P frames. To do this processing, the motion estimation processor first obtains a forward prediction of a block in the received frame being processed using the first I frame as a reference. The motion estimation processor then obtains a backward prediction of that same block using the first P frame as a reference. The two predictions are then averaged to form the final prediction for the block.
In current motion estimation devices, an exhaustive full resolution pel by pel search is performed for each block of the I or P frame. This method requires a large bandwidth bus for transfer of the video data. Furthermore, the processing time required to churn through the data slows overall system speed.