An image sequence, such as video, consists of frames of images. The coding of video sequences has been the focus of a great deal of research in recent years. High Definition Television (HDTV), video conferencing and CD-ROM archival are some of the possible applications. In the coding of such sequences, information on motion of objects in the scene from one frame to the next plays an important role. Because of the high redundancy that exists between the consecutive frames of a video sequence, high compression ratios can be obtained. For example if there is no moving object, a frame would be identical to the previous frame. The receiver can simply repeat the previous frame, and there is no need to send the current frame. The natural way to exploit the redundancy between consecutive frames consists in predicting a frame using a prior frame.
Ideally, a current frame can be reconstructed from the immediately previous frame and the difference between the current and previous frames, i.e. by using the motion information. In applications such as video conferencing, video telephone, and digital television, motion is the key to data compression. The extraction of motion information is computationally intensive and puts a burden on the hardware designed to perform such a task. In HDTV, for example, the use of four high speed processors operating in parallel was proposed recently.
In video sequences using interlaced scan, each frame is composed of two fields: one containing only the even scan lines, and one containing only the odd scan lines. It is well known that in such interlaced sequences, motion can be estimated by dividing the pixels in each field into blocks and estimating the motion of the blocks from a full or limited search in a prior field or fields.
The majority of systems determine the motion information using a block based approach. In this approach, as shown in FIG. 1 hereof, the present frame 1 is divided into blocks of pixels 3. For each block of pixels, a search is made in the previous frame 7 for the block of pixels that is the closest to it. The location of that closest block relative to the present block then defines a motion vector. To determine the closest matching block in the previous frame, a search area 13 is selected. The block 5 selected of the present frame 1 is placed on the previous frame 7 at one position 9. The pixels of the two blocks are compared. The present block is then moved to a different position 11 and another comparison is made. This process is continued until all positions in the search area 13 have been searched. The position that yields the best match is taken as the position of the current block 5 in the previous frame 7, and thus defines a motion vector. Such an approach is called the full search or the exhaustive search approach. The determination of motion vectors by the full search approach is computationally intensive.
Various approaches have been suggested to reduce the search, but the performance of these reduced search approaches are inferior to that of the full search. A number of known prior systems are discussed below.
In Hatori et al. U.S. Pat. No. 4,217,609, a television signal coding system is discussed which chooses both intra-field differences or encoding between picture elements, and inter-field difference quantizing between picture elements in one frame and an immediately successive frame. The restored values of the picture elements obtained from the intra-field predictive coding and the inter-field predictive coding are compared to a true picture element value for selecting the coding then providing the smallest error. Inter-field coding is selected when successive frames are indicative of a stationary picture, whereas intra-field coding is selected between successive frames in which the picture is moving.
Netravali et al. U.S. Pat. No. 4,278,996 teaches a "pel recursive" method and system for encoding pictorial information. The intensity value of the picture elements in a successive frame are first predicted relative to the intensity of the corresponding picture elements in the immediately prior frame adjusted by a predetermined gain factor. If an error is detected that exceeds a predetermined value, this is indicative of the pixels having moved or changed position, whereby the intensity value for the picture element at the given location is then recorded. Otherwise, it is assumed that the picture element remains unchanged between one frame and an immediately successive frame. In another embodiment of the invention, a dual prediction mode is incorporated for detecting changes in intensity beyond a predicted intensity between successive frames, and changes in displacement relative to predicted changes in displacement. Prediction errors are transmitted or encoded only if the error exceeds a predetermined level.
Tanimoto U.S. Pat. No. 4,675,733 refers to a bandwidth compression system. In FIG. 3A thereof, a subsampling technique is shown which is used for reducing the number of pixels sampled in a high definition television signal by one-half, for reducing the sampling frequency from 64.8 MHz to 32.4 MHz. A bandwidth compression system is then utilized to further reduce the number of pixels by a factor of one-half, and to reduce the frequency bandwidth required to 8.1 MHz. In FIG. 3B thereof, a subsampling pattern is shown for selecting predetermined ones of the pixels in the pattern for transmission. The number of basic pixels transmitted is equivalent to about one quarter of the original number of pixels of the associated frame. The basic pixels are processed through an interpolation circuit 20 (see FIG. 2A), for synthesizing the deleted pixels through an interpolation scheme based upon the weighted average values of neighboring basic pixels. The basic and synthesized pixels are transmitted to a receiver, for reconstructing the image. If the receiver detects an excessive error between the basic pixels and the synthesized pixels, only the basic pixels are used for reconstructing the image.
Shimura U.S. Pat. No. 4,776,029 teaches a method for compressing image signals. As shown in FIG. 2 thereof, and as described in column 4, lines 1 through 28, a Mask is formed for providing a moving average filter, whereby at any given time the average value of the pixels in the Mask frame (in this example a 3.times.3 picture element size or nine pixels are included within the Mask area) is calculated for providing an average image signal representative of the Mask. After the smoothing process, a sampling pattern is predetermined as shown in the example of FIG. 3 thereof, for sampling selected ones of main signals as shown.
Pirsch U.S. Pat. No. 4,827,340 refers to a video-signal-coding system that includes predictor circuitry for switching between two-dimensional intraframe predictors and pure interframe predictors. The decision as to which of the predictor networks or circuits are chosen is dependent upon a determination of where the least error in coding can be obtained.
Van der Meer et al. U.S. Pat. No. 4,924,306 shows a method and apparatus for providing motion estimation for pixels of a video picture. The comparisons are made relative to threshold values, which if exceeding are indicative of motion. The extent of motion of a given pixel is determined by comparing that pixel with a group of pixels from a subsequent or prior frame.
Kondo U.S. Pat. No. 4,947,249 teaches a system for coding digital video data. The pixels of each frame are encoded using a subsampling in one embodiment as shown in FIG. 4 thereof (see column 4, lines 42 through 68, and column 5, lines 1 through 19). As described a "thinning-out" process is used for eliminating a number of picture elements from being transmitted, but always within a sampling block such as shown in FIG. 4 at least one picture element is transmitted without modification. The other picture elements are compared with an average value of two other picture elements, and if the comparison results in an error below a predetermined threshold, then the pixel or picture element being examined is eliminated. Contrariwise, if the detected error is greater than the given threshold, the picture element is transmitted.
Haskell et al. U.S. Pat. No. 4,958,226 teaches the encoding of and decoding of motion in digital video in a system employing motion compensated interpolation. Two out of every three frames is interpolated for providing information coding in replacement of actual frame coding information for obtaining a reduction in the corresponding number of bytes that must be transmitted.
Carr et al. U.S. Pat. No. 5,008,748 teaches a system for coding video signals. Only picture elements of a frame which have changed relative to a previous frame are transmitted. A frame is broken down into a plurality of blocks, and picture elements of a selected block are compared with those of a corresponding block of a previous coded image to produce a matrix of values indicative of any change in value between the blocks for identifying regions of each block that have changed, whereby these changed regions are coded for transmission. Also, for each block or image area that is compared, that one of a plurality of possible sequences of picture elements within the area is selected which has the highest correlation between successive elements in the sequence.
Gillard U.S. Pat. No. 5,012,336 refers to a method and apparatus for converting a video signal conforming with a first video standard to a video signal conforming to a second video standard. Gillard teaches the "comparison of a plurality of blocks in respective first and second intervals of a video signal with a plurality of blocks in respective intervals of the video signal adjacent thereto for deriving a corresponding plurality of motion vectors for each of the plurality of blocks representing motion of the image portion represented by each of the plurality of blocks." Each interval of the video signal is equivalent to each frame of video for the corresponding picture. "The content of a search block in one field or frame is compared with the respective contents of a plurality of search blocks comprised in a search area in the following field or frame, to determine the minimum difference between the contents so compared, and hence the direction and distance of motion (if any) of the content of the original search block."
Koga U.S. Pat. No. 4,371,895, teaches a system for transmitting and receiving coded video information. In this system a frame of video is formed into a plurality of blocks of picture elements. The blocks of picture elements are processed for generating prediction errors indicative of the difference in value between the picture elements of a given block and the predicted values of such elements. The detected errors are utilized for optimizing the prediction signals. The optimized prediction error signals are coded and compressed for a transmission to a receiver, which decodes the information for reassembling the associated frame of video information.
Hirano et al. U.S. Pat. No. 4,460,923 discloses another predictive coding system for video signals. The system disclosed is similar to that of the immediately above-described Koga U.S. Pat. No. 4,371,895.
Koga, U.S. Pat. No. 4,562,468 discloses another adaptive predictive coding apparatus for coding video signals.
In T. Koga, et al., "MOTION-COMPENSATED INTERFRAME CODING FOR VIDEO CONFERENCING", Proc. Nat. Telecommun. Conf., pp. G5.3.1-5.3.5, New Orleans, La., Nov. 29-Dec. 3, 1981, the use of an algorithm for providing block-by-block motion compensated video data is taught. Motion vectors relative to video information on a frame-by-frame basis are determined by a trial and error iterative process. Comparisons are made between frames through use of blocks of picture elements. This paper is related to the above Koga U.S. Pat. No. 4,371,895, and teaches a limited search technique.
The present inventors recognize that there is a need in the art to reduce the complexity of methods and apparatus for estimating motion vectors in imaging systems.