Layer extraction has long been a research topic in the art of computer vision. Recent work has shown that the foreground layer can be accurately and efficiently extracted (i.e., in near real time) from a binocular stereo video, for example, in a teleconferencing scenario. In one application, such foreground layer extraction is used to perform high quality live background substitution. The success of the binocular approach arises from a probabilistic fusion of multiple cues, i.e., stereo, color, and contrast cues.
In most real-world visual communication scenarios, however, e.g., teleconferencing or instant messaging, most users have only a single web camera in operation. What is needed is quality foreground layer extraction using such a single web camera. For a typical scene (e.g., with a non-static “moving” background) automatic foreground layer extraction is still a monumental challenge in the current state of the art. But in a special case, in which the background is known and stationary, it would be useful to obtain high quality, real-time foreground extraction (or background removal) from a single camera.
To address this problem, the most efficient approach is background subtraction. Background subtraction detects foreground objects as the difference between the current image a pre-existing, known background image. However, there are still complex issues in such background subtraction: first, the threshold in background subtraction is very sensitive to noise and background illuminance changes. A larger threshold detects fewer foreground pixels and vice versa. Second, foreground color and background color may by chance be very similar, resulting in holes in the detected foreground object. More sophisticated techniques have been proposed to overcome these problems. But results are still error-prone and not accurate enough for high quality live foreground extraction.
Recent interactive image and video segmentation techniques have shown the powerful effectiveness of a color/contrast-based model. Color/contrast-based models consider both color similarity to manually obtained foreground/background color models and contrast (or edge) strength along the segmentation boundary. The final foreground layer is globally determined using a min-cut algorithm. But background subtraction even using color and contrast cues is still insufficient for correct foreground extraction.
A straightforward improvement is to combine the two techniques above—building foreground and background color models from background subtraction and then applying the above color/contrast based model. Because the background image is already known and stationary, the background color model can be modeled as a mixture of a global color model and a more accurate per-pixel color model. This combination can produce a more accurate segmentation result, and is referred to herein as the (conventional) “basic model.”
However, there are still problems in the basic model. Since the basic model considers both color and contrast simultaneously, the final segmentation boundary is inevitably “snapped” or attracted to high contrast edges in a cluttered background. Though this kind of error may be small around the boundary or occur only in partial frames, the flickering artifact in the running video due to this error can be very distracting and unpleasant in the final composite video.