1. Field of the Invention
The invention lies in the signal processing field. More specifically, the invention pertains to a method of detecting motion in an interlaced video sequence. The invention is particularly applicable to the conversion of an interlaced video signal to a progressive frame video signal, whereby regional motion information is utilized to define whether or not the video sequence contains motion or represents still image information. The invention also pertains to an apparatus for performing the method.
2. Description of the Related Art
In the development of current digital TV (DTV) systems, it is essential to employ a video format conversion unit because of the variety of the video formats adopted in many different DTV standards worldwide. For instance, the ATSC DTV standard system of the North America adopted 1080xc3x971920 interlaced video, 720xc3x971280 progressive video, 720xc3x97480 interlaced and progressive video, and so on, as its standard video formats for digital TV broadcasting. Video format conversion refers to a signal processing operation in which an incoming video format is converted to a specified output video format so that the output video can be properly displayed on a displaying device such as a monitor, FLCD, or a Plasma display, which has a fixed resolution.
Video format conversion systems are of significant importance since the conversion can directly affect the visual quality of the video of a DTV receiver. Fundamentally, the video format conversion operation requires advanced algorithms for multi-rate system design, poly-phase filter design, and interlaced-to-progressive scanning rate conversion or simply deinterlacing. Deinterlacing represents an operation that doubles the vertical scanning rate of the interlaced video signal.
Interlaced video in general is a sequence of separately arriving fields, such as A1, A2, A3, etc., where A1, A2, and A: are interlaced images with A1 being a top image, A2 being a bottom image, A3 being the next top image, and so on. The most popular systems currently in use, namely NTSC, PAL, and SECAM are two-field systems, where two consecutive fields (such as the top field A1 and the bottom field A2) make up a frame. Each scanned field contains, i.e., updates, every other line of a corresponding frame and the number of lines in the frame is twice the number of lines in each of the fields. Typically, the first field of a frame is identified with odd-numbered lines and the second field is identified with even-numbered lines. The fields are scanned onto the display screen one after the other at a defined frequency.
By way of example, NTSC scans close to 30 frames (60 fields of interlaced video) per second, with 525 lines per frame, and a horizontal to vertical aspect ratio of 4:3. The frame difference, therefore, is the difference between two fields having the same types (top or bottom) such as A1 and A3, or A2 and A4. PAL and SECAM scan 25 frames per second, with 625 lines per image, and the same aspect ratio of 4:3. As noted, the interlacing in all of these systems is 2:1, i.e., two fields per one frame. The primary reason for the interlacing of the lines between the fields is to reduce flicker in the display. An image that is updated, say, only 30 times a second would allow the human eye to perceive the scanning, because the image information would already start to fade before the next image is scanned onto the screen. When two fields are used, and each contains half of the information, the scanning rate in effect is raised to 60 Hz, and the human eye no longer perceives any flicker.
Deinterlacing refers to the filling of unavailable lines in each of the fields A1, A2, A3, and so on. As a result of deinterlacing, a 60 Hz field sequence (of interlaced video fields) becomes a 60 Hz progressive sequence.
Interlaced video is subject to several intrinsic drawbacks, referred to as artifacts. These include serrated lines that are observed when there is motion between fields, line flickering, raster line visibility, and field flickering. These also apply to DTV (digital TV) receivers. Historically, deinterlacing algorithms have been developed to enhance the video quality of NTSC TV receivers by reducing these intrinsic annoying artifacts of the interlaced video signal. Besides, elaborate deinterlacing algorithms utilizing motion detection or motion compensation provide excellent methods of doubling the vertical scanning rate of the interlaced video signal especially for stationary (motionless) objects in the video signal.
The present invention therefore also relates to the motion detection based deinterlacing operation that can be used for analog and digital TV receivers.
The state of the art includes a variety of deinterlacing algorithms, each having been exploited and studied comprehensively by many researchers during the last decade. Deinterlacing algorithms can be categorized into two classes, namely, 2-D (spatial) deinterlacing algorithms and 3-D (spatio-temporal) deinterlacing algorithms depending on the use of motion information embedded in consecutive interlaced video sequence. Combined spatial and temporal 3-D deinterlacing algorithms based on a notion detection give more pleasing performance than 2-D deinterlacing algorithms. The key point of a 3-D deinterlacing algorithm is how to precisely detect motion in the interfaced video signals. The publications in the following list disclose some of the applicable deinterlacing methods. They may be categorized as follows:
[1] Simple line doubling scheme, vertical filtering, vertical edge controlled interpolation method disclosed in the IEEE Transactions on Consumers Electronics, op. 279-89, August 1989 by D. I. Hentschei;
[2] Edge direct-on dependent deinterlacing method disclosed in the Proc. or the Int. Workshop on HDTV, 1994, by D. Bagni, R Lancini, and S. Tubaro;
[3] Nonlinear interpolation methods based on:
a weighted median filter disclosed in the Proc. of the IEEE ISCAS, pp. 433-36, Portland, USA, May 1989, by J. Juhola, A. Nieminen, J. Sal, and Y. Neuvo,
FIR median hybrid interpolation disclosed in Pro. Of SPIE""s Visual Communications and Image Processing, Lausanne, Switzerland, October 1990, 00. 125-32 by A. Lehtonen and M. Renfors,
a complementary median filter disclosed in Proc. of the Int. Workshop on HDTV, 1994 by H. Blume, I. Schwoerer, and K. Zygis,
[4] A motion adaptive method disclosed in IEEE Transactions on Consumer Electronics, pp. 110-114, May 1990 by C. Markhauser.
More recently, a new motion detection based deinterlacing method has been described in the following two patents:
[5] U.S. Pat. No. 5,943,099, Aug. 24, 1999, to Young-Taek Kim, entitled Interlaced-to-Progressive Conversion Apparatus and Method Using Motion and Spatial Correlation. There, an interlaced-to-progressive conversion device includes a spatial interpolator that provides for spatial interpolation and a temporal interpolator that provides for temporal interpolation of an interlaced video input signal. The system reduces jitter and related artifacts by temporally or spatially interpolating the signals.
U.S. Pat. No. 5,959,681, Sep. 28, 1999, to Yong-Hun Cho, entitled Motion Picture Detecting Method. There, two separate field memories are utilized for detecting rapid motion and slow motion in an interlaced video sequence. An interlaced video signal is thereby converted into a progressive-scanned signal. Differences between spatial interpolation and temporal interpolation are used to determine whether the image is in motion. If the differences exceed certain defined thresholds, motion is determined. The thresholds are dynamically adapted during the process.
The core of the methods described in the latter two patents is to estimate a motion decision factor based on the frame difference signal and the sample correlation in the vertical direction. These methods provide a way of reducing the visual artifacts that can be possibly arising from false motion detection by utilizing the sample correlation in vertical direction of the sampling point where the value is to be interpolated. A common drawback of those methods, however, is that they do not provide a true motion detection method when there are high frequency components in the vertical direction. In other words, when there are high frequency components in the vertical direction, the methods described in the references [5] and [6] will come to the conclusion that motion pictures are processed.
As a consequence, in many instances, those prior art processing methods do not provide for an increase in the vertical resolution even when no real motion is present between fields.
It is accordingly an object of the invention to provide a motion detection method in interlaced video, which overcomes the above-mentioned disadvantages of the heretofore-known devices and methods of this general type and which provides for a robust method of estimating a motion decision parameter which is associated with the point to point degree of motion in the interlaced video sequence. It is another object of the present invention to disclose a deinterlacing method and apparatus by utilizing the motion decision parameter of the invention.
With the foregoing and other objects in view there is provided, in accordance with the invention, a method of computing a motion decision value for further utilization in a video signal processing system. The method comprises the following steps:
inputting a video signal with an interlaced video sequence;
computing a frame difference signal from a difference between the previous field and the next (following) field of the field to be deinterlaced;
forming a point-wise motion detection signal from the frame difference signal;
computing a region-wise motion detection signal from the point-wise motion detection signal and an adjacent point-wise motion detection signal delayed by one field; and
forming from the region-wise motion detection signal a motion decision value and outputting the motion decision value for further processing in the video signal processing system.
In accordance with an added feature of the invention, the difference signal is low-pass filtered prior to the step of forming the point-wise motion detection signal.
In accordance with an additional feature of the invention, low-pass filter is defined by the matrix       W          M      xc3x97      N        =      [                                        w            11                                                w            12                                    …                                      w                          1              ⁢              N                                                                        w            21                                                w            22                                    …                                      w                          2              ⁢              N                                                            ⋮                          ⋮                          ⋰                          ⋮                                                  w            M1                                                w            M2                                    …                                      w            MN                                ]  
where w11, . . . wMN represent a set of predetermined coefficients.
In accordance with a further feature of the invention, the point-wise motion detection signal is formed according to the formula
fn(i,h)=TK(dn(i,h)) 
where fn is the point-wise motion detection signal, i and h define a spatial location of the respective video signal value in a Cartesian matrix, TK(xc2x7) denotes a threshold function represented as             T      K        ⁡          (      y      )        =      {                                        1            ,                                                              if              ⁢                              xe2x80x83                            ⁢              y                        ≥            K                                                            0            ,                                    otherwise                      }  
in which K is a positive constant, and dn( ) is the low-pass filtered frame difference signal.
In accordance with another feature of the invention, the region-wise motion detection signal is computed from the point-wise motion detection signal by logically combining the point-wise motion detection signal fn as
xcfx86n(i,h)=fn(i,h)∥fnxe2x88x921(ixe2x88x921,h)fnxe2x88x921(i+1,h) 
where fnxe2x88x921(xc2x7) denotes the motion detection signal delayed by one field, the indices i and h define a spatial location of the respective video signal value in a Cartesian matrix, and the notation 11 denotes a logical OR operation.
In accordance with again an added feature of the invention, the region-wise motion detection signal is low-pass filtered prior to outputting it. In a preferred embodiment, the region-wise motion detection signal is low-pass filtered to form the motion decision value mn(i,h) by:             m      n        ⁡          (              i        ,        h            )        =            ∑              p        =                  -          a                    b        ⁢          xe2x80x83        ⁢                  ∑                  q          =                      -            c                          d            ⁢              xe2x80x83            ⁢                                    φ            n                    ⁡                      (                                          i                +                                  2                  xc3x97                  p                                            ,                              h                +                                  2                  xc3x97                  q                                                      )                          ·                  α                      p            ,            q                              
where a,b,c,dxe2x89xa70, and xcex1p,q represents a set of normalized predetermined coefficients of a low pass filter. Preferably, the kernel of the low pass filter is defined by       [                  α                  p          ,          q                xe2x80x2            ⁢      s        ]    =            [                                    0                                              1              /              8                                            0                                                              1              /              8                                                          4              /              8                                                          1              /              8                                                            0                                              1              /              8                                            0                              ]        .  
With the above and other objects in view there is also provided, in accordance with the invention, a method of processing interlaced video signals, which comprises:
spatially interpolating a value of the video signal at a given location from a video signal of a given video field;
temporally interpolating the value of the video signal at the given location from a video signal at the same location in temporally adjacent video fields; and
forming a motion decision value for the same location in accordance with the above-summarized method; and
mixing an output signal for the video signal at the given location from the spatially interpolated signal and the temporally interpolated signal and weighting the output signal in accordance with the motion decision value.
In a preferred embodiment of the invention, the motion decision value is varied between 0 and 1 as a function of an estimate of the degree of motion at the given location and, upon estimating a high degree of motion, the output signal is heavily weighted towards the spatially interpolated signal and, upon estimating a low degree of motion, the output signal is heavily weighted towards the temporally interpolated signal.
In accordance with a specific embodiment of the invention, the temporally interpolated signal is output as the output signal upon estimating a low degree of motion, and the spatially interpolated signal is output as the output signal upon estimating a high degree of motion.
There is also provided, in accordance with the invention, an apparatus for computing a motion decision value in accordance with the above-outlined process. The novel apparatus comprises:
an input for receiving a video signal with an interlaced video sequence;
difference forming means connected to the input for computing a frame difference signal from a difference between the previous field and the next field;
means for forming a point-wise motion detection signal from the frame difference signal, and for computing a region-wise motion detection signal from the point-wise motion detection signal and an adjacent point-wise motion detection signal delayed by one field; and
means for forming from the region-wise motion detection signal a motion decision value and for outputting the motion decision value for further processing in the video signal processing system.
In accordance with yet an added feature of the invention, the apparatus has a logic member programmed to compute the motion decision value from the point-wise motion detection signal by logically combining the point-wise motion detection signal f, as
xcfx86n(i,h)=fn(i,h)∥fnxe2x88x921(ixe2x88x921,h)∥fnxe2x88x921(i+1,h) 
where fnxe2x88x921(xc2x7) denotes the motion detection signal delayed by one field, the indices i and h define a spatial location of the respective video signal value in a Cartesian matrix, and the notation 11 denotes a logical OR operation.
Finally, there is provided, in accordance with the invention, an apparatus of processing interlaced video signals, for example in an interlaced to progressive conversion, which comprises:
an input for receiving a video signal with an interlaced video sequence of fields;
a spatial interpolator connected to the input and configured to spatially interpolate a value of the video signal at a given location from a video signal of at least one adjacent location in a given video field;
a temporal interpolator connected to the input in parallel with the spatial interpolator for temporally interpolating the value of the video signal at the given location from a video signal at the same location in temporally adjacent video fields; and
a computing apparatus according to the above-outlined invention connected to the input and in parallel with the spatial interpolator and the temporal interpolator for forming a motion decision value for the same location; and
a mixer connected to receive an output signal from each of the spatial interpolator, the temporal interpolator, and the computing apparatus, the mixer being configured to mix an output signal for the video signal at the given location from the spatially interpolated signal and the temporally interpolated signal in dependence on the motion decision value output by the computing apparatus.
Other features which are considered as characteristic for the invention are set forth in the appended claims.
Although the invention is illustrated and described herein as embodied in a method of detecting motion in an interlaced video sequence and an apparatus therefor, it is nevertheless not intended to be limited to the details shown, since various modifications and structural changes may be made therein without departing from the spirit of the invention and within the scope and range of equivalents of the claims.
The construction of the invention, however, together with additional objects and advantages thereof will be best understood from the following description of the specific embodiment when read in connection with the accompanying drawings.