Motion adaptive video signal processing systems are useful in a variety of applications. Examples of such uses include interlace-to-progressive scan converters, standards converters, luminance and chrominance signal separators and the like. In such systems a motion signal is generated and used to control a parameter of the processed video signal such as, for example, the selection, mixing or blending of two or more processed video signals so as to produce a combined signal in which visual artifacts due to scene motion are reduced.
In a typical application, motion is detected by measuring the difference in the (luminance) signal level of corresponding pixels (picture elements) on successive video frames. The absolute value of this difference produces an estimate of the presence and amount of motion at that position of the image. Unfortunately, the value of the frame difference signal at a given picture location is also dependent on the amount of contrast. As the contrast decreases the value of the frame difference decreases also. In order to ensure that low contrast moving pixels are detectable as well as higher contrast moving pixels, it is conventional practice to compensate for contrast variations by limiting the frame difference signal to a relatively narrow range of values.
Typically, the absolute value of the frame difference signal is "clipped" or limited to a relatively small amplitude (e.g., 7 quantization steps) which is then considered to represent full motion. Since the motion signal thus produced has only 7 quantization levels, it can be represented by a relatively small word size ( e.g., 3 bits) and so memory storage requirements for the motion signal are relatively modest as compared with storage of a "full resolution" (e.g., 255 levels, 8 bits) motion signal.
After motion signal generation and compensation for contrast variations as noted above, it is conventional practice to additionally subject the motion signal to motion "spreading" in a motion signal processor. The term "spreading" refers to the process of adding to the motion signal motion samples from pixels surrounding the pixel being interpolated. In a sense, this process "spreads" or "expands" the motion detection area so as to be larger than the particular pixel whose motion is being measured. The advantage of including a larger number of pixels in the motion determination is that the probability of detecting moving pixels is greatly increased because of the increased motion detection "aperture" or "window" produced by "spreading" the motion signal.
In practice, motion signal spreading may be either temporal or spatial. Temporal spreading is obtained by including effects of motion signals of preceding fields and/or following fields. Spatial spreading may be realized by including the effects of motion signals of preceding and/or following motion samples either vertically (line by line) and/or horizontally (pixel by pixel). Also, one may "spread" the motion signal by combining both temporal and spatial (horizontal and/or vertical) spreading components.
As noted above, the motion signal processing spreads the motion signal temporally and/or spatially to take advantage of surrounding motion information. This reduces the probability that movement will be missed in the case, for example, where corresponding pixels of successive frames happen to have the same signal value, even though they represent different portions of a moving object.
After motion signal spreading, the final processing step in the conventional motion adaptive video processing system is to generate temporally and spatially interpolated video signals and combine them with the aid of the processed motion signal. One way the combining may be done is by applying the processed motion signal to the control input of what is commonly called a "soft switch" or "fader" circuit. The temporally averaged and the spatially averaged video signals are applied to respective signal inputs of the "soft switch". Under the control of the motion signal, the soft switch selects the temporally interpolated signal when the motion signal is low (indicating a stationary image area) and selects the spatially interpolated signal when the motion signal is high (indicating a moving image area). For motion signal values in-between low and high, the soft switch "blends" or proportionally combines the spatially and temporally signals to form the video output signal. This "blending" provides a smooth transition between spatial and temporal signal selections and so reduces the tendency for visual artifacts to be produced during change-over from one interpolated signal to another.
FIG. 1 exemplifies a typical embodiment of the above-described system. An input signal Yin to be interpolated is applied to a frame memory comprising a cascade connection of a 262H delay 10, a 1-H delay 12 and another 262H delay 14. An averager 16 averages the 1-H signals to provide a spatially interpolated signal YS which represents an average of pixels on lines above and below the location of the pixel to be estimated. Another averager 18 averages the frame and un-delayed signals to provide a temporally interpolated signal YT. This signal represents the average of corresponding pixels of the prior field and a subsequent field. For stationary regions the temporally interpolated signal YT gives the best estimate of the value of the output signal Yest. For moving portions of an image it is necessary to use the spatially interpolated signal YS to avoid motion artifacts.
Signal selection is provided by a soft switch 20 comprising a subtractor 22 that subtracts YT from YS, a multiplier 24 that multiplies the subtractor difference by a motion signal MOT and an adder 26 that adds signal YT to the multiplier output to produce the estimated or interpolated video output signal Yest. The motion signal MOT is produced by a subtraction and absolute value circuit 28 which subtracts the input and frame delayed signals and takes the absolute value of the difference to produce a basic motion indicating signal which is then applied to a motion signal processor 30 that may add processing such as motion spreading and outputs the processed motion signal MOT.
The scaling of the multiplier in the soft switch is such that when the motion signal MOT represents maximum motion, the signal YS--YT is passed with unity gain and consequently the YT signal is then cancelled in the adder and the output signal Yest equals YS. Conversely, for stationary areas the motion signal MOT is zero and so the output estimated signal becomes the temporally interpolated signal YT. For motion between zero and full motion the output signal is a proportional blend of the temporally and spatially estimated signals YT and YS.
The system of FIG. 1, while effective for high quality video input signals, suffers from sensitivity to noise for lesser quality signals and especially so for stationary images. For a stationary signal, such as a test pattern with additive noise, a high contrast horizontally oriented edge can result in a large difference between the values of YT and YS. At the same time, relatively low levels of noise will cause significant fluctuations in the motion signal MOT. The estimated signal Yest, and thus the image, will noticeably fluctuate in an undesired manner at such an edge. A further problem is that low contrast moving images may not produce a full motion signal. In this case, the soft switch will then allow some of the temporal average signal YT to pass, resulting in smearing and loss of fine detail.
In an effort to overcome the problems of conventional motion compensated systems, it has been proposed to use a different form of processing which does not employ motion signal detection and processing. One such approach is described by Hurst, Jr. in U.S. Pat. No. 5,046,164 entitled INTERSTITIAL LINE GENERATOR FOR AN INTERLACE TO NON-INTERLACE SCAN CONVERTER which issued Sep. 3, 1991. The Hurst, Jr. apparatus makes use of median filtering principles for spatial/temporal signal selection. In median filtering, plural video signal are compared and the signal having a median value is selected as an output. In the Hurst, Jr. system, a delay circuit provides plural video lines disposed about the location of the interstitial line to be generated. Comparison circuitry compares the relative values of the delayed video signals. The signals exhibiting maximum and minimum extremes are eliminated and the remaining signals are combined in predetermined proportions to provide a resultant interstitial line.
The Hurst, Jr. system is elegantly simple and remarkably effective. The median filters have some very desirable characteristics for this application. One useful property is that of continuity; e.g., continuously changing inputs produce continuously changing outputs. This eliminates the need for soft switches, which are normally used in motion adaptive systems to avoid artificially abrupt changes. Another valuable property is that the gain is limited to unity; i.e., a change in any input never produces a larger change in the output. This latter property gives median filtering systems very good immunity to noise, since the effects of the noise can never exceed the value of the noise.
One disadvantage of such systems lies in the inability of the median filter system to perform any kind of motion "spreading", since there is no explicit motion signal in the system. Lacking this capability, the non-motion adaptive systems may miss-interpolate pixels for certain types of motion (e.g., diagonal motion which results in no change in luminance level at a particular pixel location).