Objects are oftentimes desired to be extracted from an image, for example, to combine the object with a different background. To facilitate removal of a foreground object, an image matte, or matte, can be used to extract the particular foreground object in the image. As a pixel color in an image may be the color of the foreground object, the color of the background, or some combination of foreground and background color, an image matte can include alpha values that indicate a percentage of foreground color that exists at each pixel. For example, a pixel may have a combination of foreground and background color when a camera pixel sensor receives light from both the foreground object and the background. Typically, pixels around the edge of objects and pixels in regions corresponding with hair, fur, and motion blur tend to have a combination of foreground and background color. Such alpha values can then be used to extract the foreground image from the background.
Because determining alpha values for each pixel can be time consuming, trimaps can be used to reduce the number of pixels for which an alpha value is determined. Generally, a trimap indicates regions of the image that are pure foreground, pure background, and unknown combinations of blended foreground and background colors. Accordingly, only alpha values in the regions of unknown color combinations may be determined.
While trimaps can reduce the number of alpha values to determine for an image, generating accurate alpha values in association with pixels can be difficult, particularly in instances in which the foreground object occupies only a portion of the pixel (e.g., a hair strand), a transparent object exists (e.g., glass or smoke), or an optical blur exists (e.g., object not in focus). Existing matting approaches used to separate foreground and background largely rely on colors to determine alpha values associated with pixels. For example, propagation based methods propagate the unknown region between known foreground and background regions using known foreground and background color values to fill-in the unknown region. As another example, sampling-based methods take color samples from known foreground and background regions and use the samples as candidates for the foreground/background colors at a pixel in an unknown region. Alternatively, methods exist that first use a sampling method and then feed the results into a propagation method. Some existing methods have utilized deep learning in association with estimating alpha values for generating mattes. One deep learning method used to predict alpha values utilizes results from the propagation-based method and normalized RGB colors. Another deep learning method trains a network to learn a trimap for an individual(s) in a portrait. The trimap can then be used in conjunction with the propagation-based method to predict alpha values.
As described, each of these existing methods rely primarily on color to determine alpha values. Because the existing methods rely on color, such methods fail to accurately determine alpha values when foreground and background colors and/or textures are similar to each other and have difficulty with edges, making images with fur or hair hard to segment. Thus, such methods are ineffective at producing accurate mattes for typical everyday scenes with similar foreground and background colors and/or textures.