Matting technique originated in the film industry as early as late nineteenth century to replace the background in which the subject is located. The traditional approach in the film industry captured the subject directly against a single uniformly illuminated and uniformly colored background [See section titled “REFERENCES” R1,R2], and then replaced the background during the post processing. However, those steps required special hardware set up and the subject is restricted to be captured only in a controlled environment. Therefore, digital matting, a process of extracting and compositing the foreground and background objects directly from media, becomes significant technology for general image/film editing and production.
In digital matting, a matte is represented by the variable a that defines opacity of subject/foreground at each pixel. The observed image is represented as a convex combination of foreground and background layers. Most of the matting processes restrict a to be in the interval [0,1], for each pixel. The matting equation can be written asI=αF+(1−α)B,  Equation (1)where I is the observed image, F and B are foreground and background colors respectively. This compositional model is the well-known matting equation. However, this matting equation is severely under-constrained. For a single channel image, one must estimate three unknown values α, F, B at each pixel where only one value I is known. Setting constraint on alpha values of certain pixels simplifies the problem. The constrained matting problem can be viewed as supervised image segmentation where certain α values are known. Training/ground truth of α values are represented using a trimap or a set of scribbles. Trimap, as the name indicates, specifies foreground and background pixels in addition to unknown pixels for which α, F, B need to be estimated. Scribbles are a sparse representation of trimap, where the user provides samples of foreground and background pixels in the form of scribbles, using a brush of fixed width. Even though trimaps are harder to produce manually than scribbles, trimaps can be produced automatically using initial bounding box by object detection processes, such as [See section titled “REFERENCES” R3].
There are various approaches for digital matting. Poisson matting [See section titled “REFERENCES” R7] assumes that the foreground and background colors are locally smooth. Gradient of matte is locally proportional to the gradient of the image. Matte is estimated by solving Poisson equation with Dirichlet boundary conditions extracted from the trimap. Random walk matting [See section titled “REFERENCES” R8] defines affinity between neighboring pixels as a Gaussian of the norm of distance in color space. Neighboring pixels with similar colors have higher affinities than those with dissimilar colors. Matte for a pixel can be viewed as the probability that a random walker from this pixel will reach a foreground pixel without passing through a background pixel. Closed-form weights [See section titled “REFERENCES” R4, R5] assume a color line model, where local foreground colors lie on a straight line, and background colors also form a line but not necessarily the same line for foreground in RGB space. Under this assumption, the matte is shown to be a linear combination of color components of neighboring pixels. Unlike random walk matting, the affinities depend on mean and variance of local color channels. Robust matting [See section titled “REFERENCES” R9] improves robustness of the process to trimap by sampling only a few representative foreground and background pixels.
Rhemann et al. [See section titled “REFERENCES” R10] shows that Levin et al.'s Closed Form Matting approach [See section titled “REFERENCES” R4, R5] outperforms other processes in most cases. Therefore, the closed-form formula [See section titled “REFERENCES” R4, R5] was commonly extended to other applications where a compositional model was applicable as well. For example, Hsu et al. [See section titled “REFERENCES” R11] formulates white balance as a matting problem in chromaticity space. Relative contributions of two light sources are estimated at each pixel using the matting Laplacian. This is later used to neutralize and relight the scene by controlling the light contribution from each source. Haze removal approach in [See section titled “REFERENCES” R12] estimates the medium transmission map using matting model. The transmission map is an exponential function of scene depth and hence depth map can also be derived. Despite its wide usage, Levin et al.'s approach [See section titled “REFERENCES” R4] assumes a simple color line model. However, natural images do not always satisfy the color line model. Singaraju et al. [See section titled “REFERENCES” R13] has analyzed the severity of ill-posedness for different scenarios of the color line model, and presented a new point-color model to fix those limitations. As should be apparent, am unfulfilled need exists for embodiments that do not rely on such a strong assumption, and can be easily extended to incorporate multiple features in addition to color, and therefore yields more accurate matting results.