1. Field of the Invention
The present invention relates to an image extraction apparatus and method for extracting a target subject from a background image and a subject image. More particularly, the present invention is directed to a method and apparatus for appropriately generating a mask used for extracting a target subject.
2. Related Arts
Conventionally, as general techniques for realizing image extraction, a chromakey method using a specific color background, a videomatte method for generating a key signal by performing a histogram process, difference (or differential) process, contour enhancement or contour tracking process of an image signal (The Television Society Technical Report, Vol. 12, pp. 29-34, 1988), and the like are known.
A technique for performing image extraction based on the difference from the background image is a state-of-the-art one, and for example, Japanese Patent Laid-Open No. 4-216181 discloses a technique for detecting or extracting a target object in a plurality of specific regions in an image by setting a mask image (i.e., a specific processing region) in difference data between the background image and the image to be processed.
Furthermore, Japanese Patent Publication No. 7-16250 discloses a technique for obtaining color-converted data of an original image including a background using a color model of the object to be extracted, and the existence probability distribution of the object to be extracted from brightness difference data between the background image and the original image.
In the difference method from the background image, the luminance level or color component difference between the pixels of the background image and the subject image is normally expressed by a predetermined evaluation function, and the evaluation function is subjected to a thresholding process to extract a region having a difference level equal to or higher than an initial value. As the evaluation function, the correlation between blocks having individual points as centers and a predetermined size (Rosenfeld, A. and Kak, A. C., Digital Picture Processing (2nd ed.), Academic Press, 1982), normalized principal component features (Journal of the Institute of Electronics, Information and Communication Engineers, Vol. J74-D-II, pp. 1731-1740), a weighted sum value of a standard deviation and a difference value (Journal of the Television Society, Vol. 45, pp. 1270-1276, 1991), a local histogram distance associated with hue and luminance level (Journal of the Television Society, Vol. 49, pp. 673-680, 1995), and the like are used.
Japanese Patent Laid-Open No. 4-328689 and Japanese Patent Publication No. 7-31248 disclose a method of extracting a moving object alone by extracting motion vectors or inter-frame difference data from moving images. Japanese Patent Publication Nos. 7-66446, 6-14358, and 4-48030 disclose a method of extracting a moving object based on the difference from the background image. Furthermore, a method of extracting the binocular disparity distribution (i.e., the distance distribution from image sensing means) from images from right and left different view point positions obtained using a binocular image sensing system, and segmenting an object from the background on the basis of the disparity distribution (1995 Information System Society Meeting of the Society of Electronics, Information and Communication Engineers, pp. 138), or the like is known.
However, of the above-mentioned prior arts, the chromakey method suffers from the following problems:
i: this method cannot be used outdoors due to serious background limitations, and
ii: color omission occurs.
Also, the videomatte method suffers from the following problems:
i: the contour designation must be manually and accurately performed in units of pixels, and
ii: such operation requires much labor and skill.
Furthermore, the difference method from the background image is normally difficult to realize due to the following problems:
i: the background is hard to distinguish from the subject in a partial region of the subject including a portion similar to the background,
ii: the difference method is readily influenced by variations in image sensing condition between the background image and subject image,
iii: a shadow portion formed by the subject is hard to remove, and
iv: in order to faithfully extract the boundary line between the background and subject, the background image and subject image must have considerably different image characteristics (pixel values and the like) in the vicinity of the boundary therebetween.
The technique disclosed in Japanese Patent Publication No. 7-16250 is not suitable for image extraction of an arbitrary unknown object since it requires a color model for the object to be extracted.
In either the method of extracting a moving object from moving images or the method of extracting a subject from the disparity distribution, it is generally hard to extract a subject with high precision independently of the contrast in the boundary portion between the subject and background.
It is an object of the present invention to provide an image extraction apparatus and method, which can stably extract a subject image in which the background and subject have no distinct difference between their image characteristics.
It is another object of the present invention to provide an image extraction apparatus and method which can obtain a large area of a subject region before region growing by a small number of processing steps, and can extract details of a contour shape.
It is still another object of the present invention to provide an image extraction apparatus and method which can execute a process for equalizing the contour line of a mask after region growing with that of an actual subject without being influenced by the background pattern near the contour line of the subject.
It is still another object of the present invention to provide an image extraction apparatus and method which can stably grow an initial mask only in a subject region independently of variations in the region growing condition, i.e., the tolerance value of a feature difference from a neighboring region.
It is still another object of the present invention to provide an image extraction apparatus and method which can suppress variations in edge intensity distribution caused by a difference in image sensing conditions between the background image and the subject image, noise, or the like, and can accurately extract the contour shape of the subject and the edge of a background portion present in the subject region.
It is still another object of the present invention to provide an image extraction apparatus and method which can stably extract a subject image even when the edge intensity serving as a boundary between the subject and background is small and the subject includes a relatively thin shape.
It is still another object of the present invention to provide an image extraction apparatus and method which can stably extract the contour shape of a subject without being influenced by the edge distribution of a background portion present in the vicinity of the subject.
It is still another object of the present invention to provide an image extraction apparatus and method which can automatically retrieve an incomplete partial shape after region growing on the basis of the condition of shape continuity, and can smooth shape data.
It is still another object of the present invention to provide an image extraction apparatus and method which can stably extract a subject image independently of any specific difference between the image characteristics of the background and subject without being influenced by the background pattern.
It is still another object of the present invention to provide an image extraction apparatus and method which can stably and accurately extract a subject image upon executing extraction based on region growing.
It is still another object of the present invention to provide an image extraction apparatus and method which can obtain an extracted image with stably high precision independently of any specific difference between the image characteristics of the background and subject upon executing extraction based on the difference from the background image.
It is still another object of the present invention to provide an image extraction apparatus and method which can extract a subject on the basis of region growing that can faithfully reconstruct the contour shape of the object to be extracted.
It is still another object of the present invention to provide an image extraction apparatus and method which can extract a region closest to a subject while suppressing unlimited region growing.
It is still another object of the present invention to provide an image extraction apparatus and method which can obtain stably high extraction precision even for a subject having a complicated contour shape by suppressing region growing across an edge and region growing from an edge.
It is still another object of the present invention to provide an image extraction apparatus and method which can obtain stably high extraction performance even in the presence of noise such as a shadow present outside a subject (in the background) or an unclear portion of the contour of the subject.
It is still another object of the present invention to provide an image extraction apparatus and method which can realize region growing that can satisfactorily approximate the outer shape of the extracted subject to a correct subject shape even when the shape of a partial region extracted in advance does not match the contour shape of the subject.
It is still another object of the present invention to provide an image extraction apparatus and method which can realize automatic extraction of a specific subject from moving images with high precision.
It is still another object of the present invention to provide an image extraction apparatus and method which can realize automatic extraction of a specific subject with high precision using a plurality of images obtained from different view points.
In order to achieve the above objects, according to the present invention, there is provided an image extraction method for extracting, from a first image that records both a background and an object to be extracted, image data of the object using a mask, comprising:
the first step of generating an initial mask for extracting an image of the object on the basis of difference data between the first image and a second image that records the background alone;
the second step of growing a region of the generated initial mask on the basis of a similarity between features of a first region of the first image corresponding to the initial mask, and a second region in the vicinity of the first region; and
the third step of extracting the image data of the object from the first image on the basis of the grown mask region.
According to the image extraction method, subject extraction that can eliminate the influence of noise and variations in image sensing condition, and automatically removes any light shadow portion can be realized. Also, a subject region including a region having image characteristics similar to those of a background image can be extracted in the subject.
In order to achieve the above objects, according to the present invention, there is provided an image extraction method comprising:
the partial region extraction step of extracting a partial region as a portion of a subject to be extracted from an input image;
the region growing step of growing the extracted partial region using the extracted partial region as a seed by thresholding a similarity to a neighboring region in which the threshold value being set on the basis of a feature distribution at individual points of the input image; and
the extraction step of extracting an image of the subject on the basis of the region after region-growing.
According to the image extraction method, a subject image can be extracted with stably high precision independently of variations in parameters used in similarly evaluation, a shadow in the background, and complexity of the image pattern of the subject upon executing extraction based on region growing.
In order to achieve the above objects, according to the present invention, there is provided an image extraction apparatus for extracting, from a first image including both a background and an object to be extracted, image data of the object using a mask, comprising:
temporary storage means for receiving and temporarily storing the first image and a second image that records the background;
initial mask generating means for generating an initial mask of an extraction region on the basis of difference data between the stored first and second images;
region growing means for growing a region of the initial mask on the basis of a feature similarity to a neighboring region; and
first image extraction means for extracting the image data of the object from the first image on the basis of the grown mask region.
According to the image extraction apparatus, upon extraction of an initial mask, the influence of noise and variations in image sensing condition can be eliminated, and any light shadow portion can be automatically removed. Also, a subject region can be stably and automatically extracted independently of the presence/absence of a region similar to a background image in the subject.
In order to achieve the above objects, according to the present invention, there is provided an image extraction apparatus comprising:
partial region extraction means for extracting a partial region as a portion of a subject to be extracted from an input image;
region growing means for growing the extracted partial region using the extracted partial region as a seed by thresholding a similarity to a neighboring region in which the threshold value being set on the basis of a feature distribution at individual points of the input image; and
extraction means for extracting an image of the subject on the basis of the region after region-grown.
According to the image extraction apparatus, a subject image can be extracted with stably high precision independently of variations in parameters used in similarly evaluation, a shadow in the background, and complexity of the image pattern of the subject upon executing extraction based on region growing.
According to a preferred aspect of the present invention, the first step includes the step of using as the initial mask a binary image region obtained by a binarization process of difference data representing a difference between image data of the first and second images using a predetermined threshold value. The details of the subject shape can be extracted in a process before region growing while eliminating the influence of noise and the like.
According to a preferred aspect of the present invention, the difference data represents a brightness difference between the first and second images.
According to a preferred aspect of the present invention, the difference data represents a color difference between the first and second images.
According to a preferred aspect of the present invention, the first step comprises:
the step of obtaining a first binary image region by a binarization process of data representing a brightness difference between the first and second images using a predetermined threshold value;
the step of obtaining a second binary image region by a binarization process of data representing a color difference between the first and second images using a predetermined threshold value; and
the step of generating the initial mask by combining the first and second binary image regions.
According to a preferred aspect of the present invention, the second step includes the step for judging based on brightness and hue similarities between the first and second regions if a pixel in the second region is to be incorporated in the first region, and growing the mask region upon incorporating the pixel.
According to a preferred aspect of the present invention, the second step comprises:
the step of respectively extracting first and second edge intensity images from the first and second images;
the step of calculating an edge density on the basis of data representing a difference between the first and second edge intensity images; and
the step of suppressing growing of the mask when the calculated edge density is not more than a predetermined threshold value in a growing direction. Even when the region growing condition is relaxed or roughly set, region growing outside the subject can be suppressed, and high-precision subject extraction can be realized. Also, even when the initial mask region includes a region other than the subject (e.g., a shadow portion), growing from such region can be suppressed.
According to a preferred aspect of the present invention, the first step comprises:
the step of normalizing the difference data representing the difference between the first and second images, and generating the initial mask on the basis of normalized brightness difference data. In object extraction, the influence of slight variations in image sensing condition (white balance characteristics, illumination characteristics, exposure condition, and the like) between the first and second images can be suppressed.
According to a preferred aspect of the present invention, the first step comprises:
the step of extracting first and second edge intensity images representing edge intensities of the first and second images, respectively; and
the step of normalizing both the first and second edge intensity images using a predetermined normalization coefficient when the first edge intensity image is an image having a small number of edges, the normalization coefficient being a maximum intensity value of the first edge intensity image. For this reason, even when the first and second images suffer slight variations in image sensing condition (white balance characteristics, illumination characteristics, exposure condition, and the like), edge intensity variations can be prevented from being amplified. In this manner, the probability of background edge data being left in a region outside a subject in edge difference data can be made very low.
According to a preferred aspect of the present invention, the first step comprises:
the step of extracting first and second edge intensity images representing edge intensities of the first and second images, respectively; and
the step of normalizing both the first and second edge intensity images using a maximum edge intensity value within a predetermined size region having a predetermined point of the first edge intensity image as a center when the first edge intensity image is an image having many edges. Accordingly, when the subject has a fine partial shape, the contour shape of details can be stably extracted even when the edge intensity is low, and noise amplification in a low-contrast partial region in the vicinity of the subject can be suppressed upon normalization.
According to a preferred aspect of the present invention, the second step includes the step of comparing differences between brightness and hue values of the first and second regions with predetermined threshold values, and determining that the second region is similar to the first region when the differences are smaller than the predetermined threshold values. Accordingly, when the contour shape is incomplete (e.g., it includes discontinuous uneven portions different from the actual shape) as a result of region growing, correction of such shape can be performed while automatically considering the image feature""s continuity and shape continuity in the subject.
According to a preferred aspect of the present invention, the second step further comprises the fourth step of shaping a contour line of the grown mask, and the fourth step comprises:
the step of detecting the contour line of the grown mask;
the step of generating an edge intensity image representing a difference between the first and second images;
the step of setting a region having a predetermined width in a direction perpendicular to an extending direction of the contour line in the edge intensity image;
the step of selecting a plurality of pixels of the edge intensity images in the region of the predetermined width as contour point candidates; and
the step of selecting one contour point on the basis of continuity between a pixel on the contour line and the plurality of contour point candidates, thereby shaping the contour line of the mask. Accordingly, when the contour shape is incomplete (e.g., it includes discontinuous uneven portions different from the actual shape) as a result of region growing, correction of such shape can be performed while automatically considering the image feature continuity and shape continuity in the subject.
According to a preferred aspect of the present invention, the continuity is determined by inspecting pixel value continuity.
According to a preferred aspect of the present invention, the continuity is determined by inspecting shape continuity.
According to a preferred aspect of the present invention, the continuity is determined by inspecting continuity with a pixel present inside the contour line.
According to a preferred aspect of the present invention, the continuity is determined by weighting and evaluating pixel value continuity and shape continuity.
According to a preferred aspect of the present invention, the fourth step further includes the step of smoothing the shaped contour line.
According to a preferred aspect of the present invention, the fourth step comprises:
the active contour shaping step of recursively executing a process for deforming or moving a contour shape of the mask to minimize a predetermined evaluation function on the basis of the initial mask or a contour of the grown mask, and image data of the first image. Accordingly, the shape of a non-grown region that remains as a result of region growing can be corrected and retrieved.
According to a preferred aspect of the present invention, the active contour shaping step comprises:
generating a contour line by performing an active contour shaping process on the data of the initial mask, and performing an active contour shaping process of the image data of the first image on the basis of the generated contour line. Hence, the contour shape of the subject can be normally extracted without being influenced by the background pattern.
According to a preferred aspect of the present invention, the partial region extraction step includes the step of extracting the partial region on the basis of a difference between a background image excluding the subject, and a subject image including the subject. Consequently, the extracted image can be obtained with stably high precision independently of any specific difference between the image characteristics of the background and subject in subject extraction based on the difference from the background image and region growing.
According to a preferred aspect of the present invention, the feature distribution is an edge distribution of the subject. As a result, the contour shape of a subject can be faithfully reconstructed by suppressing unlimited growing in the vicinity of an edge upon executing region growing.
According to a preferred aspect of the present invention, the feature distribution is a distribution within a maximum growing range set based on the partial region. Accordingly, region growing that can eliminate the influence of noise, shadows, and illumination conditions, and can roughly obtain the subject shape can be realized inside a partial region and a region in the vicinity of the partial region.
According to a preferred aspect of the present invention, the threshold value is set to assume a value that suppresses growing of the region at an edge position as compared to a non-edge position. So, region growing outside an edge, and region growing having an edge as a start point can be suppressed, and the contour shape of a subject after region growing can be stabilized.
According to a preferred aspect of the present invention, the threshold value is set to assume a value that promotes growing of the region in a region within the maximum growing range, and to assume a value that suppresses growing of the region outside the maximum growing region. Hence, extraction faithful to the subject shape can be realized even in a partial region having a low-contrast boundary from the background, and a partial region with a shadow.
According to a preferred aspect of the present invention, the maximum growing range is obtained as an output when a shape of the partial region is smoothed using a smoothing filter having a predetermined size. Accordingly, even when the shape of a partial region extraction in advance has a missing portion or protruding portion, and has a large local difference from the subject shape, region growing that can relax the influence of such difference can be realized.
According to a preferred aspect of the present invention, the input image includes time-serial images, and the partial region extraction step includes the step of extracting the partial region on the basis of difference data between image frames at different times of the input image. As a consequence, a subject that moves in an image can be automatically extracted with high precision based on the distribution of motion vectors.
According to a preferred aspect of the present invention, the input image includes a plurality of images from a plurality of different view point positions, and the partial region extraction step includes the step of extracting the partial region on the basis of a disparity distribution between the input images. Accordingly, a specific subject can be automatically extracted with high precision based on the distribution of subject distances.
Other features and advantages of the present invention will be apparent from the following description taken in conjunction with the accompanying drawings, in which like reference characters designate the same or similar parts throughout the figures thereof.