Generally, when extracting from an obtained image an image containing a specific object domain such as a target image, human face, etc. (an image for extraction), the image extraction method differs depending on whether the obtained image is printed on paper or is in an electronically converted state.
For example, when an obtained image is printed on paper as a photograph, etc., an image for extraction can be extracted from the obtained image by cutting the target image out of the base image (photograph, etc.) using scissors, a cutter knife, etc. (image extraction method (1)).
With an image which has been electronically converted by an image obtaining device such as a CCD (charge coupled device) camera or scanner device, the image for extraction can be extracted from the obtained image by performing image processing (image extraction processing) on the base image obtained by the image obtaining device (hereinafter referred to as the "base image") using a computer, etc. (image extraction method (2)).
With image extraction method (1), the operations of actually cutting out the image for extraction using the scissors, cutter knife, etc. involve great effort, and experience is necessary to cut out the image for extraction from the base image in such a way that the target object is arranged in a balanced manner.
With image extraction method (2), in contrast, in a personal computer, etc., the image for extraction is extracted from the base image using software for image extraction.
In image extraction method (2), generally, the base image is displayed on a display device such as a monitor, and the operator specifies a desired image for extraction by indicating coordinates using a coordinate input device such as a mouse. Consequently, although the operator must become accustomed to using the software, less experience is required than in the case of image extraction method (1), and it is easy to cut out the image for extraction from the base image in such a way that the target object is arranged in a balanced manner.
Further, in image extraction method (2), one way to identify whether an image for extraction from the base image is an image containing a desired specified object domain is for the operator to perform this identification using the mouse, etc., while viewing the base image on the display device. Another method of identifying images for extraction which has been proposed is to identify images for extraction by means of a predetermined calculation method.
One example of a method of identifying images for extraction by calculation is template matching. In template matching, feature patterns possessed by objects (specific object domains) to be extracted are stored in memory in advance, and then a difference between a stored feature pattern and a corresponding feature pattern of a specified object domain of the base image is calculated to obtain an evaluation quantity. If the evaluation quantity is a value within a predetermined range, the feature pattern of the specified object domain is judged to be equivalent to the stored feature pattern, and thus an image for extraction is identified in the base image.
After identifying, as above, an image for extraction from the base image, it is then necessary to specify the specified object domain. The specified object domain is specified by setting initial values based on the shapes of the feature patterns of the specified object domains used in pattern matching for identifying the image for extraction, and then specifying a specified object domain in the image for extraction using a dynamic contour model, dynamic grid model, etc.
For example, when the foregoing specified object domain is a human face, by using a probability density function derived from a color distribution of human faces, skin areas of faces can be separated from the base image.
Further, since a moving image is an image series made up of still images arranged in a time sequence, in order to follow a specified object domain in a moving image, the foregoing method is applied, namely, an image for extraction is first identified in the base image, and then a specified object domain is specified in the image for extraction.
Specifically, first, the moving image is displayed on the display device as a series of still images. These images are treated as base images. Then, images for extraction are identified in the first base image in the series using an image tool (a coordinate input device such as a mouse) or by template matching.
Here, when using an image tool such as a mouse to identify images for extraction, a domain indicated using the image tool is stored in memory as a feature pattern, as are the position and size of the indicated domain. When identifying images for extraction by template matching, on the other hand, the size of the feature pattern used to determine the identified domain is stored in memory, as is the position of the domain of the base image corresponding to this feature pattern.
Then, for the second image of the image series, a plurality of combinations of feature pattern size and position, each altered slightly from those stored for the first image, are prepared, and evaluation values are calculated by comparing each of these combinations with the second image. This yields a plurality of evaluation values, and, using the best evaluation value, i.e., the one for which the stored feature pattern and a corresponding portion of the second image are the most similar, the size and position of the stored feature pattern are used to extract from the second image the image for extraction, which contains therein a specified object domain.
The foregoing processing is then performed on the third image of the series, and by performing this processing in turn for each subsequent image of the series, it is possible to follow a specified object domain in the moving image. Generally, in this sequence of processing, the stored feature pattern is replaced with a new feature pattern from time to time.
Accordingly, when the specified object domain to be followed in the moving image is a human face, if the feature pattern stored in advance is a probability density function derived, as described above, from a color distribution of human faces, skin areas of human faces can be separated from the base images. In other words, a human face can be followed in a moving image.
However, when using a personal computer and image processing software to extract a target object, i.e., an image for extraction containing therein a specified object domain, from an electronically converted base image obtained by a CCD camera, etc., cutting out a portion of the base image so that it contains a specified object domain, and, moreover, so that the target object is arranged in a balanced manner therein, requires some amount of experience, just as in the case of image extraction method (1) above using a photograph and scissors.
For example, when the specified object domain of an image for extraction is positioned on an edge of the screen, in order to cut out the image for extraction in such a way that the specified object domain is arranged in a balanced manner within the image for extraction, it is necessary to first cut out the image for extraction, and then to change its position so that the specified object domain is in the center of the image for extraction. Thus the operations of image extraction are very complicated.
Further, when using a method such as template matching to identify images for extracting from a base image, a difference must be calculated between the stored feature patterns and each position of the base image, thus necessitating a large quantity of calculations.
Moreover, when the size of the feature patterns contained in the base image is unknown, it is necessary to prepare stored feature patterns in a range of sizes from small to large, and to calculate differences between each of these and each position of the base image. This further increases the quantity of calculations.
In addition, when a feature pattern contained in the base image has been deformed or rotated, or when a feature pattern for which illumination conditions have been stored differs greatly between the stored conditions and the way it actually appears in the base image, the template matching method is unable to satisfactorily identify images for extraction from the base image.
Further, when specifying a specified object domain in an image for extraction, the dynamic contour model or dynamic grid model is used, but the dynamic contour model has the following problems. Namely, since what is extracted is the contour (outline) of a specified object domain, a domain surrounded by such a contour is generally considered to be a specified object domain. Accordingly, since the evaluation value obtained by calculating a difference is a convergence with a minimal value, a contour which is not the contour of the target object, but another contour, may be extracted.
With the dynamic grid model, too, like the dynamic contour model, a domain may be extracted in error if the evaluation value arrives at a local solution.
Moreover, both models are forms of processing which require many repeated computations, and necessitate a large quantity of calculations. In particular, the dynamic grid model requires a very large quantity of calculations for each computation, and thus necessitates an even larger quantity of calculations than the dynamic contour model.
Further, when the specified object domain to be specified in the image for extraction is a human face, a probability density function is first derived from a color distribution of human faces, and then applied to the base image to separate skin areas of faces therefrom, but when the illumination conditions at the time of derivation of the probability density function differ greatly from the illumination conditions when extracting the specified object domain from the base image, the specified object domain cannot be accurately specified from the base image.
Further, conventional processing for following a specified object domain in a moving image has the same problems as the processing with still images discussed above.