Matting and compositing are frequently used in image editing, 3D photography, and film production. Matting separates a foreground region from an input image by estimating a color F and an opacity α for each pixel in the image. Compositing blends the extracted foreground into an output image, using the matte, to represent a novel scene.
The opacity measures a ‘coverage’ of the foreground region, due to either partial spatial coverage or partial temporal coverage, i.e., motion blur. The set of all opacity values is called the alpha matte, the alpha channel, simply a matte.
Matting is described generally by Smith et al., “Blue screen matting, “Proceedings of the 23rd annual conference on Computer graphics and interactive techniques,” ACM Press, pp. 259-268, and U.S. Pat. No. 4,100,569, “Comprehensive electronic compositing system,” issued to Vlahos on Jul. 11, 1978.
Conventional matting requires a background with known, constant color, which is referred to as blue screen matting. If a digital camera is used, then a green matte is preferred.
Blue screen matting is the predominant technique in the film and broadcast industry. For example, broadcast studios use blue matting for presenting weather reports. The background is a blue screen, and the foreground region includes the weatherman standing in front of the blue screen. The foreground is extracted, and then superimposed onto a weather map so that it appears that the weatherman is actually standing in front of the map.
However, blue screen matting is costly and not readily available to casual users. Even production studios would prefer a lower-cost and less intrusive alternative.
Rotoscoping permits non-intrusive matting, Fleischer 1917, “Method of producing moving picture cartoons,” U.S. Pat. No. 1,242,674. Rotoscoping involves the manual drawing of a matte boundary on individual frames of a movie.
Ideally, one would like to extract a high-quality matte from an image or video with an arbitrary, i.e., unknown, background. This process is known as natural image matting.
Recently, there has been substantial progress in this area, Ruzon et al., “Alpha estimation in natural images,” CVPR, vol. 1, pp. 18-25, 2000, Hillman et al., “Alpha channel estimation in high resolution images and image sequences,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 1, pp. 1063-1068, 2001, Chuang et al., “A bayesian approach to digital matting,” Proceedings of IEEE CVPR 2001, IEEE Computer Society, vol. 2, pp. 264-271, 2001, Chuang et al., “Video matting of complex scenes,” ACM Trans. on Graphics 21, 3, pp. 243-248, July, 2002, and Sun et al, “Poisson matting,” ACM Trans. on Graphics, August 2004.
Unfortunately, all of those methods require substantial manual intervention, which becomes prohibitive for long image sequences and for non-professional users.
The difficulty arises because matting from a single image is fundamentally under-constrained. The matting problem considers the input image as a composite of a foreground layer F and a background layer B, combined using linear blending of radiance values for a pinhole camera:Ip[x,y]=αF+(1−α)B,   (1)
where αF is the pre-multiplied image of the foreground regions against a black background, and B is the image of the opaque background in the absence of the foreground.
Matting is the inverse problem of solving for the unknown values of variables (α, Fr, Fg, Fb, Br, Bg, Bb) given the composite image pixel values (IPr, IPg, IPb). The ‘P’ subscript denotes that Equation (1) holds only for a pinhole camera, i.e., where the entire scene is in focus. One can approximate a pinhole camera with a very small aperture. Blue screen matting is easier to solve because the background color B is known.
It desired to perform matting using non-intrusive techniques. That is, the scene does not need to be modified. It is also desired to perform the matting automatically. Furthermore, it is desired to provided matting for ‘rich’ natural image, i.e., images with a lot of fine, detailed structure, such as outdoor scenes.
Most natural image matting methods require manually defined trimaps to determine the distribution of color in the foreground and background regions. A trimap segments an image into background, foreground and unknown pixels. Using the trimaps, those methods estimate likely values of the foreground and background colors of unknown pixels, and use the colors to solve the matting Equation (1).
Bayesian matting, and its extension to image sequences, produce the best results in many applications. However, those methods require manually defined trimaps for key frames. This is tedious for a long image sequences.
It is desired to provide a method that does not require user intervention, and that can operate in real-time as an image sequence is acquired.
The prior art estimation of the color distributions works only when the foreground and background are sufficiently different in a neighborhood of an unknown pixel.
It is desired to provide a method that can extract a matte where the foreground and background pixels have substantially similar color distributions.
The Poisson matting of Sun et al. 2004 solves a Poisson equation for the matte by assuming that the foreground and background are slowly varying. Their method interacts closely with the user by beginning from a manually constructed trimap. They also provide ‘painting’ tools to correct errors in the matte.
A method that acquires pixel-aligned images has been successfully used in other computer graphics and computer vision applications, such as high-dynamic range (HDR) imaging, Debevec and Malik, “Recovering high dynamic range radiance maps from photographs,” Proceedings of the 24th annual conference on Computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp. 369-378, and Branzoi, “Adaptive dynamic range imaging: Optical control of pixel exposures over space and time,” Proceedings of the International Conference on Computer Vision (ICCV), 2003.
Another system illuminates a scene with visible light and infrared light. Images of the scene are acquired via a beam splitter. The beam splitter directs the visible to a visible light camera and the infrared light to an infrared camera. That system extracts high-quality mattes from an environment with controlled illumination, Debevec et al., “A lighting reproduction approach to live action compositing,” ACM Trans. on Graphics 21, 3, pp. 547-556, July 2002. Similar systems have been used in film production. However, flooding the background with artificial light is impossible for large natural outdoor scenes illuminated by ambient light.
An unassisted, natural video matting system is described by Zitnick et al., “High-quality video view interpolation using a layered representation,” ACM Trans. on Graphics 23, 3, pp. 600-608, 2004. They acquire videos with a horizontal row of eight cameras spaced over about two meters. They measure depth discrepancies from stereo disparity using sophisticated region processing, and then construct a trimap from the depth discrepancies. The actual matting is determined by the Bayesian matting of Chuang et al. However, that method has the view dependent problems that are unavoidable with stereo cameras, e.g., reflections, specular highlights, and occlusions. It is desired to avoid view dependent problems.