1. Field of the Invention
The present invention relates to an image processing technique and, more particularly, to a technique of more suitably extracting a foreground region from an image including the foreground region and a background region.
2. Description of the Related Art
Conventionally, studies on a technique of extracting a predetermined region (target region) from an image (also referred to as a segmentation technique) have been made and applied for the purpose of, for example, image synthesis or refocus on a specific region in video editing. The target region is an object to be extracted and will also be referred to as a foreground hereinafter. The target region extraction processing will also be referred to as foreground/background separation processing hereinafter.
For the target region extraction, the background subtraction method and the chroma-key method are well known as methods based on the color information of an image. In the background subtraction method, an image including not the target region but only the background is captured in advance. An image including the target region and the image including only the background are compared, and the difference between them is calculated, thereby extracting the target region. The chroma-key method is a standard method used in the movie industry, in which the background region is set to a predetermined color, and the target region is extracted assuming that the color of the target region does not include the background color. However, the background subtraction method or the chroma-key method is used only in an environment where the background is easy to control.
On the other hand, as a method that needs no specific background, there has been proposed a method called GRABCUT which manually gives coarse information including the target region in an image in advance, and separates the target region from an image having an arbitrary background (Rother et al., Grabcut: Interactive Foreground Extraction Using Iterated Graph Cuts, ACM Trans. Graph., vol. 23, No. 3, 2004, pp. 309-314 (Non-Patent Literature 1))). In this technique, color clustering is performed inside and outside the target region. Graph parameters are calculated based on the color information of each of the pixels and clusters. Minimization of the energy function of the graph is globally solved to extract the target region. The graph parameters include the similarities between neighbor pixels and the foreground likelihood and background likelihood of each pixel. In recent years, estimation of distance information (depth information) for each pixel of an image is enabled by equipping a monocular camera with a range image sensor. The distance information is used as useful information for target region extraction in addition to color information.
Current target region extraction methods using color information and distance information can roughly be classified into three types. The first method extends the feature dimensions from color information to distance information and apply color processing (C. Dal Mutto et al., Scene Segmentation by Color and Depth Information and its Applications, STDay 2010, 2010 (non-patent literature 2)). In this method, a feature amount is extracted from color information first. Then, a feature amount is extracted from distance information. These feature amounts are normalized, and conventional color processing is applied.
The second method performs processing using color information and processing using distance information, and weights the processing results (H. He et al., Graphcut-based Interactive Segmentation Using Colour and Depth Cues, Australasian Conference on Robotics and Automation (ACRA 2010), 2010 (non-patent literature 3) and Japanese Patent Laid-Open No. 2008-547097 (patent literature 1)). In this method, first, a rectangular region including the target region is designated. Next, clustering processing by color information is performed inside and outside the designated region, and the parameters of an energy function are set. Then, clustering processing by distance information is performed inside and outside the designated region, and the parameters of an energy function are set. The parameters set by the color information and those set by the distance information are weighted and determined. Finally, the energy function minimization problem is solved, thereby separating the foreground and the background.
The third method performs processing using color information and processing using distance information, and controls one of the processing results using the other (F. Calderero and F. Marques, Hierarchical Fusion of Color and Depth Information at Partition Level by Cooperative Region Merging, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2009), (non-patent literature 4) and Japanese Patent Laid-Open No. 2009-282979 (patent literature 2)). In this method, first, clustering processing by color information is performed. Then, clustering processing by distance information is performed. Finally, the number of clusters of the color information is controlled or the pixels of the clusters of the color information are rearranged using the processing result of the distance information.
In the target region extraction technique using only color information, however, a target region extraction error occurs when the target region includes a texture, or the background includes a region of a color close to the foreground color. This problem also arises when an image is globally analyzed, as in the above-described GRABCUT technique.
FIG. 1 is a view illustrating an example of region extraction. An image 100a is an example of an input image that is a processing target. The foreground of the image 100a, which is the extraction target, includes a texture pattern in a partial region. The partial region has a color close to the color of the background. This is a very difficult situation in target region extraction. It is very hard to correctly set the parameters (for example, the similarity between pixels and the foreground likelihood and background likelihood of each pixel) of the energy function for a region of the target region where a texture structure exists, or the color is not uniform. As a result, it is difficult to appropriately extract the target region (foreground region) by GRABCUT processing using only color information.
In the image 100a, the dotted line indicates a manually designated region in GRABCUT processing. That is, the region outside the rectangular region indicated by the dotted line is the background region, and the inside region is the undetermined region. Each pixel in the undetermined region undergoes judging whether it is a foreground pixel or a background pixel.
An image 100c is an example of the extraction result obtained using the above-described GRABCUT technique. A portion judged as the target region is expressed by the color of the image example, and a portion judged as the background is expressed by white. As shown in FIG. 1, the target region is partially cut. A supposed reason for this is that the parameters of the energy function are set by processing based on color information. A pixel in a region that should primarily be part of the target region but has a color close to the background color tends to have a relationship closer to the background pixels in the neighborhood than the foreground pixels. Hence, this region is readily erroneously judged as the background upon solving the energy function minimization problem.
On the other hand, in the conventional target region extraction technique using distance information as well as an input image, the given distance information is used while regarding it as correct (while giving perfect credence). Since even incorrect distance information without sufficient accuracy is used with complete reliability, the foreground separation accuracy is not necessarily improved. In addition, since processing by color information and that by distance information are executed, the processing time is long.
An image 100b is an example of the result obtained by performing distance estimation for the image 100a. In the image 100b, the distance is expressed by the brightness. The higher the brightness is, the shorter (closer) the distance to the imaging device is. However, in the image 100b, a portion where the distance estimation cannot be done is expressed by the lowest brightness (black). In this example, to simplify the explanation of the feature of distance information, the target region portion, the boundary portion between the target region and the background, the background portion, and the low-luminance portion of the target region, are represented by uniform distance values, respectively. However, an actual distance-estimated image contains many noise components having different distance values on the pixel basis. As can be seen from the above-described example, the target region roughly has a close distance value independently of the structure, and a portion near the boundary between the target region and the background region or a low-luminance portion is difficult to estimate the correct distance.