1. Field of the Invention
The invention relates in general to correlating two images. More specifically, the invention relates to the use of phase plane correlation (PPC).
2. Related Art
In image processing applications, it is desirable to obtain clear, high resolution images and video from digital data. Motion vector data provides information related to the speed and direction of movements of critical parts of an image, e.g., portions of the image determined to be changing over a period of time such as from one frame of an image to the next. Applications making use of motion vector data include, but are not limited to, format conversion, de-interlacing, compression, image registration and others where some kind of temporal interpolation is necessary. Format conversion examples include 1) frame rate conversion, such as the conversion of NTSC video rate to HDTV video rate, 2) conversion of interlaced video to progressive video, and 3) the 3-to-2 pull-down artifact removal in conventional DVD format video. Video data compression processes also benefit from accurate motion vector data.
Compression is often used to permit the useful transmission of data through a restricted bandwidth. Popular video compression algorithms utilize video compression standards, such as, for example, MPEG2, MPEG4, H.26L, etc. Another application that benefits from accurate motion vector data analysis is the production of display special effects, such as the global estimation of camera parameters useful to produce display effects for pan, tilt or zoom.
Digital processing of television signals (e.g., encoding, transmission, storage and decoding), as a practical matter, requires the use of motion vector data. Motion vector data is needed because a television signal is not typically filtered in a manner required by the Nyquist criterion prior to sampling in the temporal domain. Thus, a moving image contains information that is temporally aliased. Conventional linear interpolation techniques accordingly are not successful in the temporal domain.
The ITU-T (International Telecommunication Union Tele-communication Standardization Sector) recommends H.261 and 11.262 standards as methods for encoding, storing, and transmitting image signals. The ISO (International Organization for Standardization) recommends MPEG-1 (11172-2) and MPEG-2 (13818-2). Methods based on these standards adopt inter frame prediction for motion compensation in encoding video signals.
Inter frame prediction is based upon the recognized redundancy characteristic of video data. Video signals contain highly redundant information from frame to frame (many image elements of a predetermined frame do not move and thus will be repeated in a subsequent frame). This holds true for frames generated as a result of special effects, for example, or frames generated to increase the definition of a video signal. Motion compensated inter frame prediction is a technique that takes advantage of inter frame redundancy to reduce the amount of data required to describe sequences of video frames or to create image frames, such as those created for example, in producing a progressive scan video signal from an interlaced video signal. An accurate determination of frame to frame motion is important to conduct such operations.
One typical method for motion detection is carried out in the image domain. This method attempts to match blocks from a reference (previous) image frame with blocks from a current (subsequent to the reference) frame. Many so-called block matching methods start by calculating the absolute values of the differences in pixels in a block of a current image frame with all of the blocks in the reference image frame. A block in the current image frame having the smallest difference is determined to match. The displacement between the block in the current frame and the corresponding matching block in the reference frame is then characterized by horizontal and vertical displacement components, thus producing a motion vector. This procedure is known as the full-search procedure.
Another method for motion detection utilizes the phase plane rather than the image plane. An example of such phase plane motion detection is described in U.S. Pat. No. 7,197,074—Biswas et al., entitled PHASE PLANE CORRELATION MOTION VECTOR DETERMINATION METHOD, the subject matter of which is incorporated herein by reference as if fully set forth. Phase plane correlation (PPC) is an efficient technique for correlating two images. In the frequency domain, motion is indicated by a phase shift (phase difference) between a particular block of a current image frame and a corresponding block of a reference image frame. A correlation surface obtained by an inverse Fourier transform of the phase difference indicates the quantity of pixels that moved and the magnitude of pixel movement. This technique has the advantage of a direct determination of the motion vectors. However, phase plane correlation motion vector determination techniques do not meet current video processing demands. There remains a need for an efficient method to calculate the motion in an image with a reduction in the chance for producing erroneous assignments of motion vectors to pixels.
Now conventional phase plane correlation (PPC) results in a correlation surface in which peaks and their respective amplitudes are a direct indication of the similarity in image luminance. There is a normalization step in the conventional PPC technique that attenuates dissimilarities in luminance resulting in a normalized correlation surface that is more discerning than simple correlation. However, the traditional PPC approach does not adequately deal with different input image content.
A standard luminance based PPC correlation surface is derived as follows:
      PPC    ⁡          (                        I          1                ,                  I          2                    )        =                    F                  -          1                    ⁡              (                                                            F                ⁡                                  (                                      I                    1                                    )                                            ·                              F                ⁡                                  (                                      I                    2                                    )                                                      *                                                                                            F                  ⁡                                      (                                          I                      1                                        )                                                  ·                                  F                  ⁡                                      (                                          I                      2                                        )                                                              *                                                  )              .  
This equation is referred to as the “luminance based PPC”. I, F and F−1 are the image luminance, Fourier and inverse Fourier transforms, respectively.
FIG. 1 (Prior Art) is a block diagram illustrating a portion of a known arrangement for PPC image correlation. A luminance signal LUMA 1, which may represent a first image in the time domain, is transformed by a first Fast Fourier Transform FFT1 into a first frequency domain signal F1. A luminance signal LUMA 2, which may represent a second image, is transformed by a second Fast Fourier Transform FFT2 into a second frequency domain signal F2. Phase plane correlation is carried out in a defined window in a phase plane correlator 102 according to the expression(W·F1)·(W·F2)*/|(W·F1)·(W·F2)*|.
A frequency domain signal from phase plane correlator 102 is transformed back to the time domain by an Inverse Fast Fourier Transform IFFT to provide a phase plane correlated surface.
In order for PPC to work optimally, objects that undergo displacement (move from one frame to the next) need to be well defined. This means that the boundaries of such objects must be defined by sharp edges. However, when an input image is of a very high frequency, for example, close to the Nyquist frequency, it is difficult to clearly define the boundaries. It is often desirable to pre-process an input image using a low-pass filter. However, such pre-processing can be problematic. Frequency extremes of image content can not be properly processed with the same filter. Moreover, filtering in the time-domain requires performing a computationally complex two-dimensional convolution. For various applications, this is prohibitive.