Alpha matting refers to the problem of softly extracting a foreground object out of an image. In contrast to binary segmentation, where each pixel is either classified as fully foreground or background, alpha matting recognizes the existence of “mixed” pixels. A major reason for such mixed pixels is the limited resolution of cameras, where light from the foreground object and the background contribute to the incoming light of a CCD element. Other reasons can be motion-blur and (semi-) transparencies in the object itself. Alpha matting and thus the soft extraction of objects from a still image or a video sequence is a fundamental problem in computer vision in general and movie post-production in particular.
The mixing coefficient is typically called “alpha”. It is defined to be between 0 and 1, i.e., 0% and 100%, and describes the fraction to which light from the foreground object contributed to the incoming light on an image sensor element, i.e. to an image pixel. An alpha matting algorithm tries to estimate this alpha coefficient, as well as the unmixed foreground and background colors. Each (unmixed) color is defined by three parameters, e.g. R, G, and B values in case of the RGB color space. Alpha matting hence needs to determine seven unknowns from only three knowns. The problem is thus ill-posed and requires additional constraints.
Many algorithms for estimating alpha mattes have been developed over the recent years. Their computational complexity is usually very high, often preventing their application in professional post-production of high-resolution images. However, the achievable results are usually much more visually appealing than results of a binary segmentation.
Wang et al.: “Image and Video Matting: A Survey”, Foundations and Trends in Computer Graphics and Vision, Vol. 3 (2007), pp. 97-175, provides a good overview over the state of the art of alpha matting as of 2007. A number of different approaches exist today, where significant progress has been made over the recent years. Generally a distinction is made between two fundamental approaches to solve the matting problem, namely color sampling based methods and propagation (affinity) based methods.
Most of these algorithms assume that a trimap is provided in addition to the input image or sequences thereof. The trimap indicates three different types of regions: known foreground, known background, and an unknown region for which alpha values shall be estimated.
Color sampling based methods try to explain an observed color in the unknown region with the help of known pixels from nearby foreground and background regions. They make the assumption that the true unmixed colors that produced the observed color of the unknown pixel can be found more or less nearby in image space. A further distinction is made between parametric and non-parametric versions. The former fit a parametric statistical model, e.g. a Gaussian or a mixture of Gaussians, to the color distribution of known close-by image regions. The latter ones directly use pairs of individual samples to estimate alpha values. Recent algorithms show a trend towards non-parametric approaches. It seems to be difficult to build adequate models especially for highly textured image areas.
In the second category, propagation-based methods try to estimate the alpha values based on affinities between neighboring pixels. Pixels in the unknown region with high affinity should receive similar values. If the input image adheres to certain constraints, the algorithm may exactly recover the ground truth alpha matte. An important example is the color line model, which was used by A. Levin et al.: “A Closed-Form Solution to Natural Image Matting”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 30 (2008), pp. 228-242, to derive a closed-form solution based on the now widely used matting Laplacian. This closed-form solution, however, requires finding a global optimum over all pixels in the unknown region, which is computationally expensive. Furthermore, textured images as well as broad unknown areas still tend to be challenging.
Latest developments in the art combine the two fundamental approaches. In a first stage, a sampling-based matting algorithm is used to get a good initial estimate of the alpha matte. In a second stage, the results of the first stage are refined by a propagation-based optimization of the alpha matte (e.g. using the matting Laplacian). Two recent representatives of this class of algorithms are described by E. S. L. Gastal et al.: “Shared Sampling for Real-Time Alpha Matting”, Computer Graphics Forum, Vol. 29 (2010), pp. 575-584, and K. He et al.: “A Global Sampling Method for Alpha Matting”, Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'11) (2011), pp. 2049-2056. As can be seen from the benchmark provided by C. Rhemann et al.: “A Perceptually Motivated Online Benchmark For Image Matting”, Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR'09) (2009), pp. 1826-1833, they belong to the top-performing algorithms.
In the color-sampling stage, for each pixel in the unknown region multiple pairs of foreground (FG) and background (BG) samples are evaluated with the help of a cost function. The sample pair with the lowest cost is deemed to be the pair that is best suited to estimate the alpha value of the candidate pixel. Designing a cost function that indeed selects the sample pair that best explains the true alpha value of the unknown pixel is an art, and subject of a lot of current research.
Most of the cost functions of recent matting algorithms combine spatial and colorimetric costs to evaluate the suitability of a sample pair. In principle, the smaller the image-space distance of the sampled pixel to the unknown pixel, the better. A spatially close candidate is more likely to be a good candidate than a candidate further away. Furthermore, a pair of FG/BG samples should well model the unknown pixel's color as a linear mixture of themselves. The smaller the deviation of the observed color from the line connecting the collected sample colors, the better.
In general, the cost functions are designed in a somewhat “ad hoc” fashion. Typically, they combine unrelated physical quantities. In the work by E. S. L. Gastal et al., the cost function is defined as a product of an estimated probability and several not normalized distances in color space and image space, all of which are raised to some power. In the work by K. He et al., the cost function is merely a weighted sum of one not normalized distance in color space and two normalized distances in image space. In both cases, the parameters that control the contribution of the individual costs are usually determined experimentally by comparing results with ground-truth data, as available for example from C. Rhemann et al.