1. Field of the Invention
The present invention lies in the signal processing field. More specifically, the present invention pertains to a method of detecting motion in an interlaced video sequence. The invention is particularly applicable to the conversion of an interlaced video signal to a progressive frame video signal, whereby regional motion information is utilized to define whether or not the video sequence contains motion or represents still image information. The present invention also pertains to an apparatus for performing the claimed method.
2. Description of the Related Art
In the development of current digital TV (DTV) systems, it is essential to employ a video format conversion unit because of the variety of the video formats adopted in many different DTV standards worldwide. For instance, the ATSC DTV standard system of the North America adopted 1080x1920 interlaced video, 720x1280 progressive video, 720x480 interlaced and progressive video, and so on, as its standard video formats for digital TV broadcasting. Video format conversion refers to a signal processing operation in which an incoming video format is converted to a specified output video format so that the output video can be properly displayed on a displaying device such as a monitor, FLCD, or a Plasma display, which has a fixed resolution.
Video format conversion systems are of significant importance since the conversion can directly affect the visual quality of the video of a DTV receiver. Fundamentally, the video format conversion operation requires advanced algorithms for multi-rate system design, poly-phase filter design, and interlaced-to-progressive scanning rate conversion or simply deinterlacing. Deinterlacing represents an operation that doubles the vertical scanning rate of the interlaced video signal.
Interlaced video in general is a sequence of separately arriving fields, such as A1, A2, A3, etc., where A1, A2, and A3 are interlaced images with Al being a top image, A2 being a bottom image, A3 being the next top image, and so on. The most popular systems currently in use, namely NTSC, PAL, and SECAM are two-field systems, where two consecutive fields (such as the top field Al and the bottom field A2) make up a frame. Each scanned field contains, i.e., updates, every other line of a corresponding frame and the number of lines in the frame is twice the number of lines in each of the fields which contain video information. Typically, the first field is identified with odd-numbered lines and the second field is identified with even-numbered lines. The fields are scanned onto the display screen one after the other at a defined frequency.
By way of example, NTSC scans close to 30 frames (60 fields of interlaced video) per second, with 525 lines per frame, and a horizontal to vertical aspect ratio of 4:3. The frame difference, therefore, is the difference between two fields having the same types (top or bottom) such as A1 and A3, or A2 and A4. PAL and SECAM scan 25 frames per second, with 625 lines per image, and the same aspect ratio of 4:3. As noted, the interlacing in all of these systems is 2:1, i.e., two fields per one frame. The primary reason for the interlacing of the lines between the fields is to reduce flicker in the display. An image that is updated, say, only 30 times a second would allow the human eye to perceive the scanning, because the image information would already start to fade before the next image is scanned onto the screen. When two fields are used, and each contains half of the information, the scanning rate in effect is raised to 60 Hz, and the human eye no longer perceives any flicker.
Deinterlacing refers to the filling of unavailable lines in each of the fields A1, A2, A3, and so on. As a result of deinterlacing, a 60 Hz field sequence (of interlaced video fields) becomes a 60 Hz progressive sequence.
Interlaced video is subject to several intrinsic drawbacks, referred to as artifacts. These include serrated lines that are observed when there is motion between fields, line flickering, raster line visibility, and field flickering. These also apply to DTV (digital TV) receivers. Historically, deinterlacing algorithms have been developed to enhance the video quality of NTSC TV receivers by reducing these intrinsic annoying artifacts of the interlaced video signal. Besides, elaborate deinterlacing algorithms utilizing motion detection or motion compensation provide excellent methods of doubling the vertical scanning rate of the interlaced video signal especially for stationary (motionless) objects in the video signal.
The present invention therefore also relates to the motion detection based deinterlacing operation that can be used for analog and digital TV receivers.
The state of the art includes a variety of deinterlacing algorithms, each having been exploited and studied comprehensively by many researchers during the last decade. Deinterlacing algorithms can be categorized into two classes, namely, 2-D (spatial) deinterlacing algorithms and 3-D (spatio-temporal) deinterlacing algorithms depending on the use of motion information embedded in consecutive interlaced video sequence. Combined spatial and temporal 3-D deinterlacing algorithms based on a motion detection give more pleasing performance than 2-D deinterlacing algorithms. The key point of a 3-D deinterlacing algorithm is how to precisely detect motion in the interlaced video signals. The publications in the following list disclose some of the applicable deinterlacing methods. They may be categorized as follows:                [1] Simple line doubling scheme, vertical filtering, vertical edge controlled interpolation method disclosed in the IEEE Transactions on Consumers Electronics, pp. 279–89, August 1989 by D. I. Hentschei;        [2] Edge direction dependent deinterlacing method disclosed in the Proc. of the Int. Workshop on HDTV, 1994, by D. Bagni, R Lancini, and S. Tubaro;        [3] Nonlinear interpolation methods based on:        a weighted median filter disclosed in the Proc. of the IEEE ISCAS, pp. 433–36, Portland, USA, May 1989, by J. Juhola, A. Nieminen, J. Sal, and Y. Neuvo,        FIR median hybrid interpolation disclosed in Pro. Of SPIE's Visual Communications and Image Processing, Lausanne, Switzerland, October 1990, 00. 125–32 by A. Lehtonen and M. Renfors,        a complementary median filter disclosed in Proc. of the Int. Workshop on HDTV, 1994 by H. Blume, I. Schwoerer, and K. Zygis,        [4] A motion adaptive method disclosed in IEEE Transactions on Consumer Electronics, pp. 110–114, May 1990 by C. Markhauser.        
More recently, a new motion detection based deinterlacing method has been described in the following two, commonly assigned patents:                [5] U.S. Pat. No. 5,943,099, Aug. 24, 1999, issued to Young-Taeg Kim, entitled Interlaced-to-Progressive Conversion Apparatus and Method Using Motion and Spatial Correlation. There, an interlaced-to-progressive conversion device includes a spatial interpolator that provides for spatial interpolation and a temporal interpolator that provides for temporal interpolation of an interlaced video input signal. The system reduces jitter and related artifacts by temporally or spatially interpolating the signals.        [6] U.S. Pat. No. 5,959,681, Sep. 28, 1999, to Yong-Hun Cho, entitled Motion Picture Detecting Method. There, two separate field memories are utilized for detecting rapid motion and slow motion in an interlaced video sequence. An interlaced video signal is thereby converted into a progressive-scanned signal. Differences between spatial interpolation and temporal interpolation are used to determine whether the image is in motion. If the differences exceed certain defined thresholds, motion is determined. The thresholds are dynamically adapted during the process.        
The core of the methods described in the latter two patents is to estimate a motion decision factor based on the frame difference signal and the sample correlation in the vertical direction. These methods provide a way of reducing the visual artifacts that can be possibly arising from false motion detection by utilizing the sample correlation in vertical direction of the sampling point where the value is to be interpolated. A common drawback of those methods, however, is that they do not provide a true motion detection method when there are high frequency components in the vertical direction. In other words, when there are high frequency components in the vertical direction, the methods described in the references [5] and [6] will come to the conclusion that motion pictures are processed.
As a consequence, in many instances, those prior art processing methods do not provide for an increase in the vertical resolution even when no real motion is present between fields.