The present invention relates to a method of extracting a target object from an image sensed by an image sensing apparatus, a method of cutting out the. object, a database structure used in extraction and a method of creating the database, and an image sensing apparatus or an image sensing system that can obtain object information using these methods. The present invention also relates to a storage medium which provides a program and data to the image sensing apparatus or image sensing system or stores the database.
As a technique for discriminating the presence/absence of a specific object in an image, or searching a database for an image including a specific object and extracting the image, a pattern recognition technique is used. Methods of applying a pattern recognition technique upon executing the pattern recognition include the following methods.
More specifically, in the first method, an image is segmented into a plurality of regions in advance and cutting processing is performed so that only a specific region to be recognized remains. Thereafter, similarity with a standard pattern is calculated using various methods.
In the second method, a template prepared in advance is scanned to calculate the degree of matching (correlation coefficient) at the respective positions to search for a position where the calculated value becomes equal to or larger than a predetermined threshold value (Japanese Patent Laid-Open No. 6-168331).
Furthermore, in the third method, upon creating an image database, regions of constituting elements and constituting element names in an image are input, so as to attain high-speed search for an image having a predetermined feature (Japanese Patent Laid-Open No. 5-242160).
However, in the first and second methods, since the position or size of a specific object in an image or the hue or the like that reflects the illumination condition is not known in advance, the following problems are posed.
First, since similarity must be calculated using a plurality of standard patterns (images representing identical objects having different sizes, positions, hues, and the like), a considerably large calculation amount and long calculation time are required.
Second, it is generally difficult to find and cut out a specific region having a feature close to that of a standard pattern for the same reason as in the first problem.
Third, the template size can be set in advance under only very limited image generation conditions. When the image generation conditions are not known, the same problem as the first problem is posed. Therefore, a very long calculation time is required for discriminating the presence/absence of a specific object, searching for an image including a specific object, and the like.
In the third method, in order to input regions of constituting elements and their names in an image, input interfaces such as a keyboard, mouse, and the like are required, and when a database of images actually sensed by an image sensing means is to be created, such search data must be created after the image sensing operation.
Furthermore, an application for searching a database of images sensed using an image sensing means for an image including an object intended to be generally the main object in the scene cannot be realized by conventional image processing methods that do not use any information upon image sensing.
As a general technique for extracting (cutting) an image, a chromakey technique using a specific color background, a videomat technique for generating a key signal by image processing (histogram processing, difference, differential processing, edge emphasis, edge tracking, and the like) (Television Society technical report, vol. 12, pp. 29–34, 1988), and the like are known.
As another apparatus for extracting a specific region from an image, in a technique disclosed in Japanese Patent Publication No. 6-9062, a differential value obtained by a spatial filter is binarized to detect a boundary line, connected regions broken up by the boundary line are labeled, and regions with an identical label are extracted.
A technique for performing image extraction based on the difference from the background image is a classical technique, and recently, Japanese Patent Laid-Open No. 4-216181 discloses a technique for extracting or detecting target objects in a plurality of specific regions in an image by setting a plurality of masks (=specific processing regions) in the difference data between background image and the image to be processed.
In a method associated with Japanese Patent Publication No. 7-16250, the distribution of probability of occurrence for the object to be extracted is obtained on the basis of the color-converted data of the current image including the background image, and the difference data between the lightness levels of the background image and the current image using a color model of the object to be extracted.
As one of techniques for extracting a specific object image by extracting the outer contour line of the object from an image, a so-called active contour method (M. Kass et al., “Snakes: Active Contour Models,” International Journal of Computer Vision, Vol. 1, pp. 321–331, 1987) is known.
In the above-mentioned technique, an initial contour which is appropriately set to surround an object moves and deforms (changes its shape), and finally converges to the outer shape of the object. In the active contour method, the following processing is typically performed. More specifically, a contour line shape u(s) that minimizes an evaluation function given by equation (1) below is calculated with respect to a contour line u(s)=(x(s), y(s)) expressed using a parameter s that describes the coordinates of each point:E=∫0uE1(V(s))+w0E0(V(s))ds  (1)For                                           E            1                    ⁡                      (                          V              ⁡                              (                s                )                                      )                          =                                            α              ⁡                              (                s                )                                      ⁢                                                                                                ⅆ                    u                                                        ⅆ                    s                                                                              2                                +                                    β              ⁡                              (                s                )                                      ⁢                                                                                                                      ⅆ                      2                                        ⁢                    u                                                        ⅆ                                          s                      2                                                                                                  2                                                          (        2        )            E0(V(s))=−|ΔI(u(s))|2  (3)where I(u(s)) represents the luminance level on u(s), and α(s), β(s), and w0 are appropriately set by the user. In the technique (active contour method) for obtaining the contour line of a specific object by minimizing the above-mentioned evaluation function defined for a contour line, setting methods described in Japanese Patent laid-Open Nos. 6-138137, 6-251148, 6-282652, and the like are known as the setting method of an initial contour.
The chromakey technique cannot be used outdoors due to strict limitations on the background, and also suffers a problem of color omission. In the videomat technique, the user must accurately perform contour designation in units of pixels, thus requiring much labor and skill.
The technique using the difference from the background image cannot be normally applied when an image of only the background except for a specific object cannot be sensed (e.g., the object is huge), and the load on the user is heavy.
Since no image sensing conditions (camera parameters and external conditions such as illumination) are taken into consideration, discrimination errors of the region to be extracted from the difference data become very large unless the background image and the image including the object to be extracted are obtained under the same image sensing conditions and at the same fixed position. Also, the technique described in Japanese Patent Publication No. 7-16250 is not suitable for extracting an image of an unknown object since it requires a color model of the object to be extracted.
Of the initial contour setting methods of the above-mentioned active contour method, in Japanese Patent Laid-Open No. 6-138137, an object region in motion is detected on the basis of the inter-frame difference, and a contour line is detected on the basis of contour extraction (searching for the maximum gradient edge of a changed region) in the vicinity of the detected region. Therefore, this method cannot be applied to a still object in an arbitrary background.
In Japanese Patent Laid-Open No. 6-282652, feature points with a strong edge are extracted from an image, and points with higher evaluation values are selected from a set of feature points on the basis of the evaluation function, thereby setting the initial contour. In this case, the background image must be plain or image data that changes gradually.
Furthermore, as an example of the technique for optimizing the camera operation and the operation mode, in a method described in Japanese Patent Laid-Open No. 6-253197, the stop is set to obtain an appropriate average luminance upon sensing the background image. Thereafter, the current image is sensed using the same setting value, and the object image is extracted on the basis of difference image data therebetween.
On the other hand, as the degree of freedom in processing and modification of video information becomes higher along with the advance of digital signal processing, the internal processing of the image sensing means has seen a great change from relatively simple processing such as luminance level or color tone conversion, white-balance processing, quantization size conversion, and the like to one having an edge extraction function, and one having an image extraction function using a color component sequential growth method (Television Society technical report, Vol. 18, pp. 13–18, 1994).
However, since the methods that use difference data from an image of only the background do not consider any image taking conditions (camera parameters and external conditions such as illumination) except for the technique described in Japanese Patent Laid-Open No. 6-253197, discrimination errors of the region to be extracted from difference data become very large unless the background image and the image including the object to be extracted are obtained under the same image taking conditions and at the same fixed position.
On the other hand, the method described in Japanese Patent Publication No. 7-16250 is not suitable for extraction of an image of an unknown object since it requires a color model of the object to be extracted.
The method associated with Japanese Patent Laid-Open No. 6-253197 merely discloses a technique in which the setting value of the stop upon sensing the background image is used upon sensing an image including a specific object on the premises that the image sensing means is set at the same fixed position, and the same image sensing conditions as those upon sensing the image including only the background are used. In this method that gives priority to the image sensing conditions of the background image, the image quality of the object to be extracted, i.e., an image including a specific object is not normally guaranteed.
Furthermore, the chromakey method cannot be used outdoors due to serious limitations on the background and also suffers a problem of color omission.
Also, in the videomat method, the contour designation operation must be manually and accurately performed in units of pixels, thus requiring much labor and skill.
The method of detecting regions segmented by a boundary line by detecting the boundary line by differential calculations can hardly be applied to an object having a complex texture pattern, and offers no stable and versatile boundary line detection processing scheme.
As a method of extracting information associated with an object by performing template matching, i.e., as a technique that can be used for searching for, tracking, or recognizing a specific object from an image sensed by a camera, a model base technique performed based on feature vector extraction (constituting line segment, shape parameter extraction) processing and subsequent comparison with a feature vector model of a feature vector is known (Japanese Patent Publication No. 6-14361, Japanese Patent Laid-Open No. 6-4673, and the like).
As a technique for detecting the motion of an object, a method disclosed in Japanese Patent Laid-Open No. 5-232908 cuts the portion to be subjected to motion extraction on the basis of the luminance level of a projection component to track motions at the respective points in the regions of interest in time-series images.
However, in the former example, since templates or models having different sizes must be prepared for a target image in correspondence with changes in size of a specific object in an image to perform matching in units of regions of the image, a very large memory capacity and a very long calculation time for feature vector extraction and matching with models are required.
Alternatively, the zooming parameter or the like of the image sensing system must be manually adjusted, so that the size of the target image becomes nearly equal to that of the model.
In the latter example, it is generally difficult to stably cut out an action extraction portion on the basis of the luminance level of a projection component of an image. Also, after motions at the respective points are tracked, it is difficult to interpret the motions at the respective points as one action category by combining such motion information, except for a simple action.
Furthermore, in constructing an image sensing apparatus or system, since the image sensing means does not have any command communication means for externally controlling the image sensing mode upon extraction of an object or any image sensing parameter control function required upon sensing an image for object extraction, image sensing conditions optimal to image extraction cannot be set.
Therefore, the image sensing conditions cannot be optimally set in correspondence with image taking situations such as a change in illumination condition, the presence/absence of object motions, the presence/absence of motions of the image sensing means itself, and the like for the purpose of image extraction.
Japanese Patent Laid-Open No. 6-253197 above discloses a technique in which a stop control unit is set to obtain an appropriate average luminance upon sensing the background image, the current image is sensed using the same setting value as that for the background image, and a specific object image is extracted based on difference data between the two images.
However, again, an image sensing system cannot set optimal image sensing conditions to image extraction since an image sensing unit has neither a command communication control unit for appropriately controlling the image sensing mode from an external device upon extracting a specific object image nor a control function of the image sensing parameters required for sensing an image used for extracting a specific object image. Therefore, the image sensing conditions cannot be optimally set in correspondence with image taking situations such as changes in illumination conditions, the presence/absence of object motion, the presence/absence of motions of the image sensing unit itself, and the like.
When a specific object image is to be extracted by remote-controlling a camera, a communication control means, a communication system, control commands, and the like has not been established yet. In particular, optimal image sensing conditions such as the field angle, focusing, illumination conditions (the presence/absence of flash emission), and the like for a designated object cannot be automatically or interactively set.
For example, setting an optimal field angle is important for removing the unwanted background region as much as possible and for efficiently performing image extraction processing. However, such function cannot be realized since communication control and image sensing control systems for performing such setting operation between the camera and the terminal device have not been established yet.