1. Field of the Invention
This invention relates to analysis of video image sequences, more particularly to methods for processing saturated intervals in continuous video sequences.
2. Background of the Invention
Digital video processing has become commonplace, whether the processed video data is displayed in digital or analog format. Several techniques in this area rely upon information in the video sequence both before and after the current interval. Two examples of these techniques are compression-decompression and interlaced-to-progressive scan conversion.
Because of the reliance on information in other intervals, intervals with saturated illumination cause problems in these techniques. Luminance saturation can occur when a bright flash of light, such as a camera flash, is recorded during the interval. Intervals that suffer from this problem will be referred to as saturation intervals. The effect of a saturated interval on various processing sequences can be seen in the MPEG coding/decoding process.
Compression of video data allows information to be transferred more quickly or stored in less space. Of course, when it is in compressed form, some method of decompressing it to restore the full image is also necessary. In some instances an encoder that performs compression will have a counterpart decoder that is in the same system. However, in many cases the data is compressed by one system but not decompressed in the same system. Defined protocols determine the compression scheme, which allows anyone with a decoder compliant, with that protocol to decompress the images. Examples of these types of protocols are MPEG-1 and MPEG-2, which will be used merely for discussion purposes.
MPEG standards refer to frames and fields. For simplicity of discussion, the unit of analysis will be intervals, with the understanding that an interval could be a frame, a field, a portion of a field, or some other, yet-to-be defined, segment of a video sequence. The boundaries of the intervals must be set prior to performing any analysis. In this way, the MPEG discussion will use the term intervals.
MPEG (Moving Picture Experts Group) has several intended phases of implementation. MPEG-1 is compression for xe2x80x9cCoding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbits/s.xe2x80x9d MPEG-2 is a more general purpose scheme for xe2x80x9cGeneric Coding of Moving Pictures and Associated Audio.xe2x80x9d The schemes rely on three different kinds of intervals, xe2x80x9cIxe2x80x9d or intracoded intervals, xe2x80x9cBxe2x80x9d or bi-directional intervals, and xe2x80x9cPxe2x80x9d or predicted intervals.
An I interval is encoded as a still image, not using any past history. P intervals are predicted from the most recently coded I or P interval. The MPEG encoding process forms blocks and then derives the coefficients of the discrete cosine transform (DCT) for those blocks. Compressed P intervals usually contain motion vectors and difference coefficients from the last I or P interval. Compressed B intervals are similar to predicted intervals, except that their differences are from the closest two I or P intervals, one before and one after. The sequence of these intervals over time is shown in FIG. 1.
The sequence of I, B and P intervals can be of several different configurations. One example would be IPPPPBBI. If the saturated interval occurs at an I interval, that I interval cannot be used effectively for predicting subsequent P or B intervals. The P interval will require more encoding data because the P interval will be drastically different from the I interval, or it will be of a lower quality. If a saturated interval occurs at a P interval, the B intervals will is be of lower quality, as will subsequent P intervals.
The relationship between these intervals in the compression sequence results in the difficulties with the saturated intervals. However, other techniques use the information relationships between the intervals for processing. Another example is the conversion of incoming data from interlaced format to progressively scanned format.
Most broadcast video arrives at the receiver in interlaced format. Interlaced format typically divides a frame of data into two fields, which are intervals in this discussion, one for the odd-numbered lines, and one for the even-numbered lines. The fields are not necessarily easily combined together, since they were sent with the intention of being displayed separately. Interlaced format relies upon the response of the eye to integrate the two fields together, so each field can be displayed separately within a very short time. There may be slight differences between the two fields due to motion of objects in the intervals.
For these reasons, among others, it is sometimes desirable to interpolate the lines missing from each field to constitute a frame that can be displayed in its entirety. One example is shown in FIG. 2. The sequence of fields shows that the picture element (pixel) X is to be interpolated in frame 1, field 2. Data can be used from the current field, from X""s nearest neighbors, A, B, C, D, E and F. However, information can be used from the corresponding pixel locations from the previous field, field 1, frame 1, as well as the next field, field 1, frame 2, which are labeled Xxe2x80x3 and Xxe2x80x2, respectively.
In these instances, a saturated interval, such as those caused by photographic flashes, breaks the continuity of the analysis. If field 1, frame 1 is a saturated interval that data would be useless to assist in interpolating X. Similarly, if the saturated interval occurs in field 1, frame 2, that data is useless for interpolation. The lower amount of available data will result in an inaccurate interpolation of X, similar to the inaccurate coding and decoding of the I, P or B intervals due to the saturated interval.
Therefore, it would seem to be beneficial to develop a method to detect saturated intervals, and then, of course, to develop methods of processing them.
One method currently used to detect those saturated intervals is discussed in an article xe2x80x9cRapid Scene Analysis on Compressed Video,xe2x80x9d published in IEEE Transactions on Circuits and Systems for Video Technology, Vol. 5, No. 6, December, 1995, pp. 533-544, by Yeo and Liu (Yeo and Liu).
The method of Yeo and Liu is to detect a situation in which the sum of the absolute or squared differences between two frames varies from the local mean of the sum of absolute or squared differences by more than a given threshold.
For a better understanding of this method, let L(x,y,t) be the luminance of a pixel at location (x,y) in the field occurring at time t. For interlaced material, the even field will have luminances defined only at locations L(x,2y,t), and the corresponding odd field will have luminances defined only at L(x,2y+1,t+1), with luminance values at other pixel locations, the interleaving rows, in the given fields being identically zero. Let w be the number of columns in the luminance frame, and 2h be the number of lines in the luminance frame.
Using the above nomenclature, D(t) as used by Yeo and Liu is defined as       D    ⁡          (      t      )        =                    ∑                  x          ,          y                    ⁢                        "LeftBracketingBar"                                    L              ⁡                              (                                  x                  ,                  y                  ,                  t                                )                                      -                          L              ⁡                              (                                  x                  ,                  y                  ,                                      t                    +                    2                                                  )                                              "RightBracketingBar"                α              +                  ∑                  x          ,          y                    ⁢                        "LeftBracketingBar"                                    L              ⁡                              (                                  x                  ,                  y                  ,                                      t                    +                    1                                                  )                                      -                          L              ⁡                              (                                  x                  ,                  y                  ,                                      t                    +                    3                                                  )                                              "RightBracketingBar"                α            
where the absolute difference (a=1) or squared difference (a=2) between successive fields is accumulated over the entire frame.
The method disclosed in the paper by Yeo and Liu then defines a detection scheme which searches for a [D(t), D(t+xcex4)] pair which are similar in magnitude, are the maximum of the values for a local window of width m+1, and differ from the average of the other mxe2x88x921 values in that window by some defined margin. Note that the pair need not be contiguous in the sequence.
This method then prescribes a window of 10-15 frames for this purpose, but does not specify whether this window must be centered upon the frames of interest. Yeo and Liu also suggest the use of a metric that consists of the absolute difference between the sums of the pixel luminance values of successive frames as an alternative measure, with a similar detection scheme.
However, this technique has several limitations. Among these are the higher number of frames required, which reduces the ability for this technique to be implemented in real time, and increases the necessary processing and memory requirements. Additionally, Yeo and Liu describe no technique to solve the problems raised by the existence of these saturated intervals in any sort of processing, examples of which were discussed above.
Therefore, a method is needed to reliably detect saturated intervals in such a manner that the detection can be done in real time and with minimal processing overhead. Similarly, a method is needed to process these saturated intervals to mitigate their effects on any schemes that rely on relationships between video intervals.
One embodiment of the invention is a method allowing processing of saturated intervals in a sequence of video intervals. One embodiment of the method removes that saturated interval from the sequence and alters the sequence to accommodate the removal. The accommodation includes completely eliminating the interval, substituting a repeat of the previous interval for the removed interval, or interpolating a substitute interval for the removed interval.
Another embodiment retains the interval and extracts information while limiting the effects of the saturation. One embodiment reduces the data rate for the saturated interval and increases the data rate for subsequent intervals. Another embodiment uses temporal prediction-specific processing to force the encoding of the saturated interval as a bi-directional, or B, interval, or avoids using the saturated interval as a prediction interval by encoding the next interval as an intra, or I, interval.
Another embodiment relies upon known characteristics of the encoder and decoder in a closed system. The coefficients are manipulated to add an error value to the prediction value for whatever transform is applied, such as DCT or wavelet. The new coefficients are then clipped to allow only those within a predefined range to be sent to the decoder. This results in only the low frequency coefficients being sent, while not altering the high frequency coefficients. In this way the information is still present but the effects of the saturation has been limited.