Field of the Invention
The present invention relates to an image processing apparatus that identifies a target area and a background area in an image, and also relates to an image processing method.
Description of the Related Art
Broadly classified, there are two types of techniques for extracting a target area in an image: object segmentation and alpha matting. These techniques are used to, for example, recognize, understand and retrieve an image, generate a synthesized image, and refocus a subject.
Object segmentation is a technique to cut a target area out of a background area and generate a binary segmented image. Typical methods thereof may include the chroma key method and the background difference method that are standard methods used by filmmakers. There is also a method that requires no specific background. In the method, a user designates a range in which an object is included, or paints a part of a background and a part of the object, and extracts a target area based on color information (see BOYKOV Y., and JOLLY, Interactive graph cuts for optimal boundary and region segmentation of objects in N-D images, Proceedings of IEEE Int. Conf. on Computer Vision, 2001., ROTHER et al., Grabcut—Interactive Foreground Extraction Using Iterated Graph Cuts, ACM Trans. Graph., vol. 23, No. 3, 2004, pp. 309-314).
Object segmentation for binary segmentation of the target area and the background area has difficulty in extraction of a target area with precision in an image including an object with a complicated shape, such as hair and fur, and an object with a translucent portion.
FIGS. 1A and 1B illustrate an example of a binary segmented image generated by object segmentation. FIG. 1A is an example of an input image and FIG. 1B is an example of a result of binary segmentation. An extracted target area is illustrated by white color and a background area is illustrated by black color. A boundary between the target area and the background area is illustrated by a solid line and a contour of an actual object is illustrated by a dotted line.
Alpha matting is a technique to estimate transparency (hereafter, referred to also as “alpha”) of a pixel located on a boundary between the target area and the background area, and extract the target area more precisely. Basically, in a matting process, a ternary trimap image is generated (“ternary” corresponds to an obviously target area, an obviously background area, and an area that is not defined to belong to either of the target area and the background area), alpha of a pixel located in an undefined area is estimated, and an alpha image is generated. Hereinafter, an obviously target area is referred to also as a “defined foreground area,” an obviously background area is referred to also as a “defined background area,” and an area that is not defined to belong to either of the target area and the background area is referred to also as an “undefined area.”
Alpha expresses a probability that a pixel belongs to a foreground or a background and may also be considered as a ratio of composition of colors of the foreground and the background. If alpha of a pixel located in the defined foreground area is set to 1 and alpha of a pixel located in the defined background area is set to 0, alpha of a pixel in the undefined area is a value between 0 and 1, which means that the foreground shields a part of the background at that pixel. Several alpha estimation methods are proposed (see L Grady, T Schiwietz, S Aharon, Random Walks for Interactive Alpha-Matting, Proceedings of VIIP, 2005., Yung-Yu C, et al. A Bayesian Approach to Digital Matting, Proceedings of IEEE Computer Vision and Pattern Recognition, Vol. II, 264-271, 2001).
Typically, an input image is separated into the foreground color and the background color based on a theory that a given input image includes a linear mixture of a foreground color and a background color, and alpha in an undefined area is estimated in accordance with the separation result. Every alpha estimation method requires input of a trimap. Further, precision in alpha estimation is greatly influenced by the quality of the trimap.
FIGS. 2A to 2C illustrate an example of a trimap image and an example of alpha estimation. FIG. 2A is an example of an input image. In an example of a trimap of FIG. 2B, a defined foreground area is illustrated by black color, a defined background area is illustrated by white color, and an undefined area is illustrated by gray color. As a reference, a contour of an object is illustrated by a dotted line. FIG. 2C is an example of an alpha image generated by alpha estimation. Alpha of the defined foreground area in FIG. 2B is 1 and the area is illustrated by white color. Alpha of the defined background area in FIG. 2B is 0 and the area is illustrated by black color. An alpha estimation result of the undefined area in FIG. 2B is a value between 0 and 1, and the area is illustrated by gray color that is a neutral color of white and black.
The trimap may be generated manually or automatically. In manual generation, a user operates, for example, a paint tool to set an area considered to be a foreground to a defined foreground, set an area considered to be a background to a defined background, set an area located near a contour of a target area as an undefined area, and paint a ternary trimap image. Regarding automatic generation, the following methods are mainly proposed.
In automatic generation based on a binary segmented image generated by object segmentation, an undefined area of predetermined width is set on a boundary between a target area and a background area extracted by segmentation. Then a trimap image that consists of a pixel belonging to the foreground area, a pixel belonging to the background area, and a pixel belonging to the undefined area is generated.
In automatic generation based on both object segmentation and alpha matting, a trimap is generated first by the above automatic generation method and then alpha is estimated. Then a pixel of which alpha is between 0 and 1 and is located near a predetermined elliptical set by an edge flow vector is set to be an undefined pixel and is updated. Then it is determined that the trimaps before and after the update are the same or substantially the same (i.e., it is determined that the trimap has been converged). If the trimap has not been converged, the above series of processes, i.e., alpha estimation, update of the trimap, and convergence determination, are repeated (see Japanese Patent Laid-Open No. 2010-66802).
In manual generation of a tri-map image, a user needs to correctly set the thickness of the boundary area along the contour of the target area, which requires considerable skill with tools. Further, the user often needs to repeat trial and error with objects having complicated shapes, which makes the input operation difficult and complicated.
In automatic generation of the tri-map image based on binary segmentation by object segmentation, objects with complicated shapes cannot be extracted highly precisely. Therefore, an undefined area set in a uniform width as a result, a defined foreground area and a defined background area derived therefrom may not be correct. If the width of the undefined area is narrowed uniformly, an area near the true contour of the object is not necessarily included in the undefined area. If the width of the undefined area is uniformly large, the true contour of the object is included in the undefined area. In that case, however, an error in color estimation of a pixel in the undefined area and an error in alpha estimation are diffused greatly. An outer edge portion of an object (i.e., a target) is not necessarily uniform in feature. If an undefined area is set in an outer edge portion having nonuniform feature, correct alpha estimation becomes difficult.
FIGS. 3A to 3C illustrate an example of a trimap image generated based on binary segmentation by object segmentation. FIG. 3A is an example of an input image. In an example of binary segmentation of FIG. 3B, an extracted target area is illustrated by white color, a background area is illustrated by black color, a boundary between a target area and the background area is illustrated by a solid line, and a contour of an actual object is illustrated by a dotted line. It is turned out that binary segmentation between an object and the background area differs from a contour of an actual object. In an example of a trimap of FIG. 3C, a defined foreground area is illustrated by black color, a defined background area is illustrated by white color, and an undefined area is illustrated by gray color. In an alpha estimation process using this trimap, transparency of only the pixel in the gray area is estimated. Therefore, an area surrounded by an ellipse 301 that must have been a part of the object, i.e., an area near the true boundary, is included in the black defined background area and correct alpha estimation therefor is not possible.
In automatic generation of a tri-map image based on both object segmentation and alpha matting, after the generation of an initial trimap by object segmentation, update of the trimap by matting and a matting process using the updated trimap is repeated. Such an operation imposes large processing load and, therefore, it is difficult to perform alpha estimation with high precision.
In recent years, estimation of a distance, i.e., a depth in an image has become possible by, for example, using a distance measurement sensor mounted on a monocular camera, or using a multi-viewpoint image acquired from a plurality of cameras. Especially in stereo photographing, for example, distance estimation may be performed using images in which the same object is included, distance estimation values may be compared while changing reference images, and distance reliability may be calculated using the distance estimation values (i.e., if the estimated values are close to each other, distance reliability is high and, if the estimated values are not close to each other, distance reliability is low). Such distance information, i.e., the distance value or distance reliability, is used for image processing as useful information.
FIGS. 4A and 4B illustrate examples of an input image and a range image. FIG. 4A is an example of an input image, in which a periphery of the subject is unclear and has partially low brightness. FIG. 4B is an example of a result of distance estimation of the example of the input image, in which a distance is illustrated by a degree of brightness. It is considered that the distance to an imaging apparatus is shorter as the degree of brightness is higher. A portion in which distance estimation is not performed or in which the distance is infinity is illustrated by black color. It is understood from this example that correct distance estimation is difficult in an area near a boundary between a subject and a background area. From this tendency, area estimation near the true boundary becomes possible. In this example, in order to make the feature of the distance information understandable, a subject portion, a background portion, a boundary portion between the subject and the background, and a portion of the subject with low brightness are illustrated each having a uniform distance value. However, since an actual distance estimation image includes a great deal of noise having a distance value different in every pixel, area estimation in an area near the boundary is insufficient with only the distance information.
The present invention generates a trimap with a nonuniform width depending on features of an image, based on boundary information and distance information of a target area and the rest of areas, and enables extraction of a target area in a simple and highly precise manner.