Obtaining clear, high resolution images and video from digital data continues to be a difficult problem faced in the image processing field. Fundamental data to determine for many applications are motion vector data. Motion vector data provides knowledge of the speed and direction of movements of at least critical parts of the image, e.g., portions of the image determined to be changing over a time period such as from a predetermined image frame to a subsequent image frame. Applications making use of motion vector data include format conversion, de-interlacing, compression, image registration and any others where any sort of temporal interpolation is necessary.
Specific format conversion examples include frame rate conversion, such as the conversion of NTSC video rate to HDTV video rate and the conversion of interlaced video to progressive video. Another format conversion example is the 3-to-2 pull-down artifact removal in conventional DVD format video. Video data compression is another example that benefits from accurate motion vector data. Compression is generally necessary to permit the useful transmission of data and forms a critical part of many video compression algorithms, such as the video compression standards MPEG2, MPEG4, H.26L, etc. Another exemplary application that benefits from accurate motion vector data analysis is for production of display special effects, such as the global estimation of camera parameters useful to produce display effects for pan, tilt or zoom.
Digital handling of television signals (e.g., encoding, transmission, storage and decoding), as a practical matter, requires use of motion vector data. Motion vector data is needed because a television signal is not typically filtered in the manner required by Nyquist criterion prior to sampling in the temporal domain. Thus, a moving image contains information that is temporally aliased. Conventional linear interpolation techniques accordingly are not successful in the temporal domain.
The ITU-T (International Telecommunication Union Tele-communication Standardization Sector) recommends H.261 and H.262 as methods for encoding, storing, and transmitting image signals. The ISO (International Organization for Standardization) recommends MPEG-1(11172-2) and MPEG-2 (13818-2). These methods adopt inter frame prediction for motion compensation in encoding video signals.
Inter frame prediction is based upon the recognized redundancy characteristic of video data. Video signals produce highly redundant information from frame to frame, as many image elements of a predetermined frame will be repeated in a subsequent frame. This holds true for frames generated as a result of special effects, for example, or frames generated to increase the definition of a video signal. Motion compensated inter frame prediction is a technique that takes advantage of the inter frame redundancy to reduce the amount of data required to describe sequences of video frames or to create images frames, such as those created for example, in producing a progressive scan video signal from an interlaced video signal. An accurate determination of frame to frame motion is important to conduct such operations.
A typical method for motion detection in the prior art is conducted in the image domain and involves an attempt to match blocks from a reference (previous) image frame with blocks from a current (subsequent to the reference) frame. Many so-called block matching methods start with calculating the absolute values of the differences in pixels in a block of a current image frame with all of the blocks in the reference image frame. A block in the current image frame having the smallest difference is determined to match. The displacement between the block in the current frame and the corresponding matching block in the reference frame is then characterized by horizontal and vertical displacement components, thus producing a motion vector. This procedure is known as the full-search procedure.
In the full-search procedure, the absolute values of the differences between all pixels contained in the block from the current frame and all pixels contained in all reference blocks within a reference image frame are calculated. The sum of the absolute values of the differences needs to be calculated for each reference block. Ideally, a method should be able to measure motion up to about 15 pixels per field for a standard television signal, to better than one pixel accuracy. Therefore, the amount of calculation is exorbitant. Hence, high computational speed is necessary. To reduce the computation load many researchers have proposed smart searching techniques, but they often reduce the accuracy of the vectors.
A phase plane correlation technique for motion vector determination has also been developed. In the frequency domain, motion is indicated by a phase shift between a block in the current image frame and one in the reference image frame. A correlation surface obtained by an inverse Fourier transform of the phase difference indicates the quantity of pixels that moved and the magnitude of pixel movement. This has the advantage of a direct determination of the motion vectors. There remains a need for a method to calculate the motion in an image efficiently, and with a reduction in the chance for producing erroneous assignments of motion vectors to pixels.