The present invention relates generally to processing video images and, in particular, to picture segmentation and superposition of real time motion pictures.
The problem of combining parts from different images to create a new superpositioned picture can be decomposed into the following steps: picture segmentation, positioning and scaling, light and spatial parameter matching and superposition of video image parts. As is well known, picture segmentation is a formidable problem.
Picture segmentation involves separating the picture part of interest (PPI) from other parts of an image. These latter parts are called the background (BG). The separated picture part of interest is then embedded into another picture, called the basic picture (BP).
At present, picture part of interest separation from a background is based on either chroma-key, luma-key or code key methods. However, these background keys generally require specific room, studio or light conditions or a priori information about the position of the PPI in the given frame. Therefore, their use is unsuitable for many applications where these standardized conditions can not be or are usually difficult to meet.
Much research has been carried out trying to improve picture segmentation. The following patents discuss various aspects of prior art methods in this field:
U.S. Pat. Nos. 3,961,133, 5,301,016; 5,491,517; 5,566,251; Japanese Patents 4-83480, 6-133221, 55-71363 and Great Britain Patent 1,503,612.
The following articles and books also deal with the subject:
Richard Brice, Multimedia and Virtual Reality Engineering, 1997, Newnes, 307
Lynn Conway and Charles J. Cohen, xe2x80x9cVideo Mirroring and Iconic Gestures: Enhancing Basic Videophones to Provide Visual Coaching and Visual Controlxe2x80x9d, IEEE Transactions On Consumer Electronics, vol. 44, No. 2, p. 388-397, May 1998;
Andrew G. Tescher, xe2x80x9cMultimedia is the Messagexe2x80x9d, IEEE Signal Processing Magazine, vol. 16, No 1., pp. 44-54, January 1999; and
Eugene Leonard, xe2x80x9cConsiderations Regarding The Use Of Digital Data To Generate Video Backgroundsxe2x80x9d, SMPTE Journal, vol. 87, pp. 499-504, August 1978,
The present invention relates to processing video images and, in particular, to picture segmentation and superposition of real time motion pictures. The images processed can be in arbitrary, non-standardized backgrounds. The images can be video images with the picture part of interest moving in and out of focus. In some embodiments, the background used for superposition generally are still images, while in others the background can be moving images. The system can operate in real time with the signals being processed pixel-by-pixel, line-by-line, and frame-by-frame. There are no processing interruptions and video signal loss.
The present invention describes a method for image processing of a frame. The method includes the step of separating a picture part of interest from the frame, where the frame has an arbitrary background. In another embodiment, the above separating step may further include the steps of receiving a background frame having an arbitrary background, receiving an input frame having a picture part of interest within an arbitrary background, and separating the picture part of interest from the arbitrary background using the input and background frames.
The method of the invention may use a background frame that is a still or moving image. The method may include a background frame or a picture part of interest that is out of focus.
In an embodiment of the present invention, the step of separating includes the step of spatially separating the picture part of interest and background of the input frame.
In another embodiment of the invention, the step separating further includes the step of generating the difference between luminance and chrominance signal values of the input and background frames.
According to another embodiment, the step of separating further comprises the steps of filtering an input signal, estimating the pulse signal maximum of the filtered input signal, and determining the time difference between adjacent pulse signal maxima of the input signal. This is followed by comparing the time difference to a threshold value, and accepting a signal for use as part of a picture part of interest mask if the signal is below the threshold value. The above steps generally are applied separately to the luminance, red chrominance and blue chrominance components of the input signal.
According to one aspect of the invention, a system is taught for separating a picture part of interest of an input frame. The input frame has an arbitrary background. The system comprises a mask generating unit for generating a picture part of interest mask using the difference between chrominance and luminance signal values in the input frame and an arbitrary background frame. The system also includes a separator unit for separating a picture part of interest from the input frame using the picture part of interest mask.
In another embodiment, the mask generating unit further comprises a luminance compensation unit for compensating for changes in background luminance signal when going from frame to frame. The unit also includes a means for generating a picture part of interest by removing the compensated background luminance signal from a picture part of interest in an arbitrary background signal.
In yet another embodiment of the present invention, the above mask generating unit further comprises a means for generating a chrominance signal for a picture part of interest by subtracting out a background chrominance signal from the picture part of interest in an arbitrary background signal.
According to another embodiment, the above mask generating unit further comprises a background luminance frame memory for providing a background luminance signal, a first divider for dividing luminance signals from an input frame and a memorized background luminance signal (quotient A), the memorized background luminance being provided by the background luminance frame memory. The mask generating unit also includes a second switch receiving a luminance window from a window pulse generator unit and quotient A from the first divider. The mask generating unit further includes an averaging circuit for calculating an estimated light change coefficient K by averaging quotient A over the pixels of the luminance window. There is also a second divider for detecting changes in the input frame luminance signal by dividing an input frame luminance signal by the light change coefficient K (quotient B), and a first summer for subtracting the quotient B from the memorized background luminance signal.
In still another embodiment the mask generating unit further comprises a background chrominance frame memory for providing a memorized background chrominance ratio signal, the memory being set up by a switch. It also includes a divider for dividing red and blue chrominance signals (quotient C), the chrominance signals received from a color unit, and a summer for subtracting the quotient C from the memorized background chrominance ratio signal.
In yet another embodiment of the present invention, the separator unit of the system comprises a time aligner for aligning color signals of the input frame, and a switch receiving the time aligned input frame color signals, and a picture part of interest mask from the mask generating unit. The switch generates a picture part of interest from the mask and the color signals.
Another aspect of the invention teaches a system for separating a picture part of interest from an input frame having an arbitrary background where the system includes at least one high pass filter for determining the edges of an input signal of the picture part of interest within the arbitrary background, at least one pulse signal maximum estimator for determining the time maxima of the filtered input signal, and at least one maximum-to-maximum time determiner for determining the time difference between consecutive signal maxima. The system also includes at least one comparator for comparing the time difference with a predetermined threshold value, and an OR gate for generating a picture part of interest mask from signals received from the at least one comparator.
In a further embodiment, the preceding system processes the luminance, red chrominance, and blue chrominance signals of the input signal separately. Each of the signals is processed by its own high pass filter, pulse signal maximum estimator, time determiner, and comparator.
The present invention also teaches a method for text detection for use in superpositioning a video image in a basic picture containing text. The method comprises the steps of storing signal data from a plurality of input frames, the data being selected based on predetermined criteria, the storage being effected on a pixel by pixel basis. The method further includes decoding a stored input frame by determining the number of consecutive stored frames that have text at a predetermined pixel position, and determining the number of pixels with text in a given frame and comparing the number to a predetermined criteria. A text is determined to exist if a sufficient number of pixels in enough consecutive frames have met the predetermined criteria.
The present invention further teaches a system for positioning a picture part of interest in a basic picture where the basic picture contains text. The system includes a text detector unit for generating a text mask from a luminance signal of a window mask of the basic picture, a scaler for scaling a separated picture part of interest, the scaling being controlled by a background/foreground controller, and a means for embedding the scaled picture part of interest in the text mask.
In yet another embodiment the text detector of the above system further comprises a frame storage unit for storing filtered signals on a pixel-by-pixel basis for each frame, a decoder for determining the number of consecutive stored frames which have text in corresponding pixel positions, and a counter for counting the number of pixels in each of the stored frames.
Another aspect of the present invention teaches a method for superpositioning a picture part of interest in a basic picture. The method includes the step of matching parameters between the basic picture and the picture part of interest where the parameters consist of at least one of the following: luminance, chrominance and spatial resolution.
In one embodiment of the invention, a system is taught for matching parameters of a separated picture part of interest and a basic picture when superpositioning the picture part of interest in the basic picture. The system comprises a luminance matching unit for matching luminance signals of the basic picture and the picture part of interest, a chrominance matching unit for matching chrominance signals of the basic picture and the picture part of interest, and a spatial resolution matching unit for spatially resolving the basic picture and the picture part of interest using the luminance and matched luminance signals of the basic picture and the picture part of interest.
According to another embodiment of the invention, the above system comprises a spatial resolution matching unit which further comprises a means for delaying a basic picture luminance signal for one and more than one arbitrary time units and comparing these delayed signals after subtracting out the original basic picture luminance. It also includes a means for time aligning a matched luminance signal where the matching is effected for the basic picture and picture part of interest luminances and a means for filtering a matched basic picture and picture part of interest luminance. It further includes at least one switch for receiving either the once or the more than once delayed basic picture luminance and the time aligned and filtered matched luminance for producing a matched spatially resolved basic picture and picture part of interest.
In another embodiment of the present invention, the spatial resolution matching unit comprises a first memory pixel for delaying a basic picture luminance signal by one time unit, a first summer for subtracting the once delayed luminance signal from the basic picture luminance signal, a second memory pixel connected to the first memory pixel for delaying a basic picture luminance signal by a second time unit, and a second summer for subtracting the twice delayed luminance signal from the basic picture luminance signal. It also includes a comparator for comparing the signal intensities of the differences between the once and twice delayed signals (S2) and the basic picture luminance signal (S1), a time aligner for aligning a luminance matched picture part of interest signal (S3), two low pass filters, one for filtering the luminance matched picture part of interest signal (S4) when the basic picture is out of focus and the second for filtering a luminance matched picture part of interest signal (S5) when the picture part of interest is positioned in the background. The matching unit further includes a first switch for determining an acceptable luminance signal from among the filtered, aligned and compared signals (S1, S2, S3, S4). A controller for controlling the positioning of the picture part of interest depending on whether the picture part of interest is to be placed in the foreground or background, is also present. Finally, the unit includes a second switch for producing a mask from the luminance signal data and foreground-background information received from the first switch, the controller and luminance matched picture part of interest (S5) from the luminance matching unit.
In the text below and the accompanying figures, the following abbreviations are used:
PPIxe2x80x94Picture part of interest;
BGxe2x80x94Background;
BPxe2x80x94Basic picture;
PPIABGxe2x80x94Picture part of interest in arbitrary background;
EPxe2x80x94Embedded picture;
LPFxe2x80x94Low pass filter;
HPFxe2x80x94High pass filter;
CAxe2x80x94Comparator;
SWxe2x80x94Switch;
Rxe2x80x94Red component signal;
Bxe2x80x94Blue component signal;
Gxe2x80x94Green component signal;
Yxe2x80x94Luminance signal;
Crxe2x80x94Red color difference (chrominance) signal (R-Y);
Cbxe2x80x94Blue color difference (chrominance) signal (B-Y);
SBGYxe2x80x94Stored background luminance frame;
SBG(Cr/Cb)xe2x80x94Stored background chrominance ratio frame;
VDxe2x80x94Vertical drive;
HDxe2x80x94Horizontal drive; and
MPxe2x80x94Memory pixel.