During playback of moving images—that is, the sequential presentation at appropriate points in time of a sequence of still images (frames)—a viewer may sometimes observe an undesired brightness variation, ‘flicker’, which was not present in the depicted scene. As used herein, a scene is a region of space which is visible or partially visible in the field of view of an imaging apparatus for recording moving images. Although the whole scene may not be visible at a given moment, it can be covered successively during a shot by panning Flicker may be caused by a light source having intensity oscillations that are fast enough to be imperceptible to the human eye. However, the recording includes sampling, at the frame rate of the imaging apparatus, of this oscillation frequency, which may give rise to a lower, visibly perceptible frequency through the process of sampling. FIG. 1 illustrates how samples (shown as circles) of a high-frequency signal can be interpreted as coming from a low-frequency signal and vice versa; this phenomenon is referred to as aliasing.
One may distinguish different kinds of flicker. In a gray-scale video sequence, flicker is an unintentional—and usually periodic—variation of the single channel of the image signal. Such variation may affect the whole frame or only a sub-region, which may correspond to a region of space having a particular illumination. When colour video technology is used, an oscillating white light source may affect the recorded video sequence in a different way than an oscillating coloured light source. As will be explained in the next few paragraphs, the precise interpretation of flicker in terms of image components depends on the precise colour video format used.
Firstly, if the video sequence is encoded in terms of linear primary colour components, such as RGB, the undesired oscillation will be present in all components in the case of a white light source. If the oscillating light source is coloured, it will contribute an oscillating term to each colour component in proportion to the composition of the colour of the light source; for example, an oscillating red light source will contribute predominantly to the R component of an RGB signal and less to the G and B components.
Secondly, several widespread colour video formats are based on a three-dimensional YCbCr colour space. Such a video format comprises one luma channel Y (encoding the luminance component, or brightness, of a pixel) and two chroma channels Cb, Cr (encoding the chrominance components of a pixel in terms of the deviation from white). The luma component corresponds to the single image channel of gray-scale video; hence if a YCbCr colour video signal is to be reproduced by a gray-scale receiver, then channels Cb, Cr can simply be ignored. The precise definition of the image components (as regards constants, scaling, offset etc.) may vary between different particular video formats, but generally there exists an unambiguous transformation (sometimes a linear transformation) between a primary colour format and a YCbCr format. Especially, all three primary-colour components make a positive contribution to the luminance, such as via the linear relationship Y=ρR+γG+βB, wherein the relative values of coefficients ρ>0, γ>0, β>0 have been determined in accordance with a standard white colour. Thus, whether the light source causing flicker is white or coloured, the flicker will manifest itself as a variation in the luminance component. On the other hand, a coloured light source may also cause oscillation of the Cb and/or the Cr component.
Thirdly, there exist further colour video formats based on the triple of hue, saturation and lightness, notably the HSL, HSV, HLS, HIS and HSB formats. Generally, a transformation to and from the RGB format accompanies each video format of this kind. Flicker, at least white flicker, will be detectable in the lightness/value/brightness/intensity channel (L or V), which will not be distinguished from luminance in the rest of this disclosure.
The discussion in the previous paragraphs intentionally does not distinguish between analogue and digital formats since, for the purposes of this disclosure, the latter may be regarded as quantised versions of the former. Likewise, some video formats may exist in a gamma-compressed or partially gamma-compressed version, such as the R′G′B′ and the Y′CbCr formats, in addition to the linear version. However, it is immaterial for the understanding of the present invention whether the video format includes such compression.
Because viewers may find flicker disturbing or unpleasant, there has been an interest in the field of video processing in detecting and correcting it. As regards detection, many state-of-the-art methods are based on the Fourier transformation, which decomposes a signal into a linear combination of components having different frequencies including zero. On the basis of the relative importance of different frequencies (as expressed by the Fourier coefficients) it can be established whether flicker is present or not. A detection method according to this principle is shown in EP 1 324 598; this method includes discrete Fourier transformation of averages of the image signals. As recognised by those skilled in the art of signal processing, algorithms involving Fourier transformation have the following drawbacks:                they cannot be applied to non-stationary signals, such as video signals in which the frame rate may vary over time due to non-equidistant sampling;        they do not resolve non-sinusoidal flicker in a signal well, since the energy of the fundamental frequency is in part lost to higher harmonics; and        they may be computationally complex.        
Other approaches to detection may be based on computing the statistical variance. For example, the method disclosed in US 2007/036213 applies a lower threshold condition on the variance in order to determine when flicker reduction is necessary. Partly because an increase in variance can have other sources than flicker, such detection methods are known to produce a large percentage of false alarms.
Several available methods for suppressing or removing flicker are based on correction of each frame in a flickering sequence against a reference frame. More precisely, a cumulative distribution function (CDF) or, by another name, a cumulative histogram is generated for the frame to be corrected and a reference CDF is generated for the reference frame. The pixel values are then adjusted in order that the CDF for the corrected frame is approximately equal to that of the reference frame. In general, it is not necessary for the reference frame to be identical (apart from the brightening or darkening caused by flicker) to the frame to be corrected, but it should preferably depict a similar scene with respect to background, lighting, etc. The method disclosed in U.S. Pat. No. 5,793,886 provides a representative example. To generate the reference CDF, the method computes CDFs for both an earlier and a later frame in the sequence, and then interpolates these in accordance with the position of the frame which is to be corrected.
With consumers' increased access to wideband Internet connections, not only voice-over-IP technology but also video calls and video conferencing have proliferated in recent years. Since audio and video data are here transmitted as a stream of packets, both the sending and the receiving party are required to handle the data on a real-time basis and not as finite batches. Annoying image flicker may occur in video calls just as in any kind of moving images, but available methods (see above) for detecting and resolving flicker are often ill-suited. Most importantly, many existing methods—besides the one referred to above—necessitate knowledge of both preceding and subsequent frames in the stream. Such non-causal processing methods cannot be applied to the real-time case without buffering frames, which delays transmission of the data stream. The least buffer length is the maximal expected duration of a flickering portion of the video sequence plus the processing (correction) time per frame plus one reference frame at the end of the flickering portion. In the case of state-of-the-art Internet communications, in which a certain delay for concealing network jitter already exists, most users would find an additional delay to be unacceptable. Therefore, buffering would imply a significant drawback.
While means for detecting and/or suppressing image flicker in live broadcasting are known in the art, most such devices cannot be integrated in consumer products because of their high degree of sophistication. Similarly, methods directed to reducing flicker during recording may presuppose access to advanced optical hardware, such as adaptive image sensors and shutter arrangements susceptible of being regulated. A provider of, e.g., video call services cannot presume that such hardware features are available, but is forced to accept image data from whatever devices the users of the service operate. Finally, the sheer complexity of certain methods make them inapplicable to video calls. On a personal computer under normal load, a reasonably accurate Fourier-based detection method may engage an inconveniently large portion of the CPU capacity; at least, computationally complex methods imply a risk of forcing the video call system into making an ad hoc quality reduction, such as a drop in frame rate, a reduction of image size etc.