1. Field of the Invention
This invention relates to a method for extracting an object image, wherein a predetermined object image is extracted from an image during processing of image information. This invention also relates to a method for detecting a gradient of a contour line field, wherein a gradient of a field is detected from a contour line of an object image, which is embedded in an image. This invention further relates to a method for extracting a contour line of an object image, wherein a contour line of a predetermined object image is extracted from an image. This invention still further relates to a method for detecting a gradient of an object image color field wherein, during processing of image information, information concerning a gradient of an object image field is detected, which field occurs from a color, a size, and a shape of the object image located in the vicinity of the region of view. This invention also relates to a method for detecting a movement of an image wherein, during processing of image information, a movement of an image occurring from a travel of an image input device (or a viewpoint) is detected, in particular, a method for detecting a movement of an entire image due to a travel of a viewpoint, which travel accompanies a movement of a human being or his eyeballs, or due to a travel of an image input device.
2. Description of the Prior Art
A human being views an image and recognizes what the thing embedded in the image is. It is known that this action can be divided into two steps. A first step is to carry out "discovery and extraction" by moving the viewpoint, setting a target of recognition at the center point of the visual field, and at the same time finding the size of the object. A second step is to make a judgment from a memory and a knowledge of the human being as to what the object present at the viewpoint is. Ordinarily, human beings iterate the two steps and thereby acquire information about the outer world.
On the other hand, in conventional techniques for recognizing a pattern by carrying out image processing, typically in pattern matching techniques, importance is attached only to the second step. Therefore, various limitations are imposed on the first step for "discovery and extraction." For example, it is necessary for a human being to intervene in order to cut out a target and normalize the size of the target. Also, as in the cases of automatic reading machines for postal code numbers, it is necessary for a target object to be placed at a predetermined position. As pattern recognizing techniques unaffected by a change in size and position of a target, various techniques have been proposed wherein a Judgment is made from an invariable quantity. For example, a method utilizing a central moment, a method utilizing a Fourier description element, and a method utilizing a mean square error have been proposed. With such methods, for the purposes of recognition, it is necessary to carry out complicated integrating operations or coordinate transformation. Therefore, extremely large amounts of calculations are necessary in cases where it is unknown where a target object is located or in cases where a large image is processed. Also, with these methods, in cases where a plurality of objects are embedded in an image, there is the risk that their coexistence causes a noise to occur and causes errors to occur in recognizing the objects. Thus these methods are not satisfactory in practice.
As a model for recognizing a target object, which model is unaffected by the size of a target object or by a shift in position of a target object, a model utilizing a neocognitron, which is one of techniques for neural networks, has been proposed. The neocognitron is described by Fukushima in "Neocognitron: A Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," Collected Papers of The Institute of Electronics and Communication Engineers of Japan, A, J62-A(10), pp. 658-665, Oct. 1979. The neocognitron is based on the principle that pattern matching is carried out on a small part of a target object, a shift in position is assimilated at several stages with a layered architecture, and the shift in position is thereby tolerated. However, with such a principle, a limitation is naturally imposed on achievement of both the accurate recognition and the assimilation of the shift in position. It has been reported, for example, by Nagano in "Neural Net for Extracting Size Invariant Features," Computrol, No. 29, pp. 26-31, that the neocognitron can ordinarily tolerate only approximately four times of fluctuation in size. As for the shift in position, the neocognitron can tolerate only approximately two or three times the size of a target object. The tolerance capacity remains the same also in a recently proposed neocognitron model which is provided with a selective attention mechanism.
How the visual function of a human being carries out the first step has not yet been clarified. On the other hand, how the viewpoint moves has been clarified to some extent as described, for example, by Okewatari in "Visual and Auditory Information Processing in Living Body System," Information Processing, Vol. 23, No. 5, pp. 451-459, 1982, or by Sotoyama in "Structure and Function of Visual System Information Processing, Vol. 26, No. 2, pp. 108-116, 1985. It is known that eyeball movements include a saccadic movement, a follow-up movement, and involuntary movement. Several models that simulate these eye movements have been proposed. For example, a model in which the viewpoint is moved to the side of a larger differential value of an image is proposed, for example, by Nakano in "Pattern Recognition Learning System," Image Information (I), 1987/1, pp. 31-37, or by Shiratori, et al. in "Simulation of Saccadic Movement by Pseudo-Retina Mask," Television Engineering Report, ITEJ Tec. Rep. Vol. 14, No. 36, pp. 25-30, ICS' 90-54, AIPS' 90-46, June 1990. Also, a model in which the viewpoint is moved to the side of a higher lightness is proposed, for example, by Hirahara, et al. in "Neural Net for Specifying a Viewpoint," Television Engineering Report, ITEJ Tec. Rep. Vol. 14, No. 33, pp. 25-30, VAI' 90-28, June 1990. Additionally, a model in which the viewpoint is moved to a point of a contour having a large curvature is proposed, for example, by Inui, et al. in Japanese Unexamined Patent Publication No. 2(1990)-138677. However, these proposed models are rather simple and do not well simulate the human visual function.
Also, for the purposes of finding a target of recognition and extracting a region including the whole target, instead of adhering only to local features of the target object, it is necessary that the movement of the viewpoint becomes stable (stationary) at the center point of the whole target. However, with the aforesaid conventional models, such an operation for stabilizing the viewpoint cannot be carried out. For example, with the model proposed by Shiratori, et al. wherein the pseudo-retina mask is utilized, the viewpoint moves forward and backward around the contour line of an object and does not become stable. Also, with the model proposed by Inui, et al., the viewpoint can ultimately catch only a feature point at a certain limited part of an object. Additionally, most of the aforesaid conventional models requires, as a tacit precondition, that the background of an object is simple. Thus most of the aforesaid conventional models cannot be applied to natural images, such as ordinary photographic images.
As described above, various techniques have been proposed which enables satisfactory recognition of a target in cases where a human being intervenes in order to assimilate a shift in position of the target or a change in the size of the target or in cases where the position and the size of the target are normalized in advance. However, no excellent technique has yet been proposed, with which the entire target object image can be extracted from an image for the purposes of recognizing the object image.
In the field of techniques for extracting a predetermined object image from an image in accordance with contour lines of the object image, which is embedded in the image, and making judgments from the extracted object image as to the state of the image, attempts have heretofore been made to analyze in detail the relationship among many contour lines contained in the image, to compare the results of the analysis with knowledge given in advance, and to determine or discriminate, based on many combinations of contour lines, what contour lines of what object are contained in the image. In this manner, it becomes possible to know what thing is represented by a portion of the image.
Recently, there has been proposed the concept that, when an image is considered from points of view of various features, such as contour lines, luminance distributions, colors, and shapes, the so-called "field" of the image based on the features exists. Such concept is described in, for example, Japanese Patent Application No. 3(1991)-323344 for the invention, which is made by Ono and concerns extraction of a candidate for an object image with a map of a potential field.
From the point of view of contour lines of an object image, it may be considered that the so-called "contour line field" exists conceptually. As one example of the "contour line field," a conical field may be considered in which the field sinks towards the center point of an object surrounded by contour lines. As another example of the "contour line field," a conical field may be considered in which the field sinks towards the positions of contour lines themselves.
If information concerning a gradient of a contour line field is obtained, even if the total shape of the contour lines of the image is unknown, the information can be utilized in various fields of image processing. For example, the information concerning the gradient of the contour line field can be utilized in order to predict the direction towards the center point of an object, which is surrounded by a contour line, or to predict the direction along a contour line of an object. Also, the magnitude of a gradient value corresponds to the amount of image information at a corresponding position in the image. Therefore, the information concerning the gradient of the contour line field can be utilized during compression of the image information, or the like. Thus the information concerning the gradient of the contour line field is the image information capable of being utilized for a wide variety of purposes.
Only when many lines, which form an outer shape of an object image embedded in an image, are located with a significant positional relationship, it can be regarded that the lines constitute contour lines of a single object or a plurality of objects. In general, an image has contour lines of an object image and vary many other lines. Lines other than the contour lines of an object image also occur due to a shadow of an object, which shadow is formed due to a slight influence of light, creases on the surface of an object, a pattern on the surface of an object, or the like. Selecting only the lines, which constitute contour lines, from the lines embedded in an image and eliminating the other lines are very important as techniques for preprocessing in various image processing steps. With one of typical methods for selecting the lines, which constitute contour lines, the relationship among many lines contained in an image is analyzed in detail. The results of the analysis are then compared with knowledge given in advance. Based on many combinations of lines, it is determined or discriminated what contour lines of what object are contained in the image.
Also, a method has been proposed wherein end points of contour lines are detected from a given image, and it is predicted that a contour of a target will be present in directions in which the end points and the contour lines intersect perpendicularly with each other. Such a method is proposed by, for example, Finkel L. H., et al. in "Integration of Distributed Cortical Systems by Reentry: A Computer Simulation of Interactive Functionally Segregated Visual Areas," JONS (1989), Vol. 9, No. 9, pp. 3188-3208. With the proposed method, even if an object recorded in a given image merges into the background, or even if the contrast of the image is low and contour lines of an object cannot be recognized, contour lines are formed from end points of contour lines embedded in the image, and the target is thereby extracted from the image. For example, as illustrated in FIG. 92A, in cases where objects 210A, 210B, 210C, and 210D are embedded in an image, the end points of these objects are extended. In this manner, as illustrated in FIG. 92B, contour lines of an object 211 are formed.
As described above, human beings extract an appropriate size of a target of recognition from an image of the outer world and thereafter efficiently carry out recognition processing. On the other hand, with the conventional methods, an attempt is made to recognize a target in an image of the outer world only with a single processing system. Therefore, problems occur in that very complicated procedures and a very long time are required. Also, problems occur in that it is necessary for a human being to intervene in the extraction of the target, or it is necessary for the background of the image to be simple. Accordingly, the conventional methods are not satisfactory in practice. These problems occur because no efficient method has heretofore been available for extracting a target object, which is to be recognized, from an image of the outer world, and the structure of the recognition system is such that a heavy burden is imposed on a judgment means of the recognition system.
Also, with the conventional methods described above, in cases where a plurality of target object images are embedded in an image, it often occurs that an object image, which has already been extracted, is again extracted. Such re-extraction of the object image, which has already been extracted, adversely affects the extraction of a target object image, which has not yet been extracted. Therefore, the efficiency, with which the extraction is carried out, cannot be kept high.
Additionally, with the conventional methods which simulate travel of the viewpoint, it often occurs that the viewpoint stops not only at a position, at which the viewpoint finds a target object, but also at a position, at which no target object is located. In such cases, it is necessary for a person to intervene such that the viewpoint may get away from the stop state at the position, at which no target object is located. Such problems also render the conventional methods unsatisfactory in practice.
Further, with the conventional methods described above, in cases where a small object different from a target object is encountered when the viewpoint travels over a given image towards the target object, it often occurs that the different object is recognized as the target object by mistake. In such cases, the viewpoint stops at the position of the different object and cannot travel towards the target object any more.
The aforesaid methods, wherein an object image is extracted from an image for the purposes of obtaining information concerning a gradient of a contour line field, have the drawbacks in that very large amounts of calculations are required. Also, if contour lines have missing parts, or if the shapes of the contour lines are incomplete, comparison with knowledge given in advance cannot be carried out appropriately. Consequently, the determination or discrimination about what contour lines of what object are contained in the image cannot be effected. Additionally, if a failure in discrimination occurs, the problems occur in that even information concerning parts of contour lines cannot be obtained.
Also, with the aforesaid models which simulate travel of the human viewpoint, importance is merely attached to portions of an image, at which differential values of the image are large or curvatures of contour lines are large. Such processes are too simple, and it is difficult to detect a contour line field with such processes.
The aforesaid methods, wherein only the lines, which constitute contour lines, are selected from lines embedded in an image, and the other lines are eliminated, have the drawbacks in that, as the number of the lines embedded in the image becomes large, enormous amounts of calculations are required for combinations of the lines. Also, if the lines have missing parts, or if the relationship between the lines is incomplete, an inconsistency will occur between the lines and the knowledge given in advance, and therefore comparison with the knowledge given in advance cannot be carried out appropriately. Consequently, the determination or discrimination of contour lines cannot be effected.
Additionally, it often occurs that a plurality of object images are embedded in a given image. For example, as in the cases of an image of a human face with a mask and a human face image recorded on the foreground side of a signpost image, a small object image may be located on the foreground side of a large object image and may overlap upon the large object image, or portions of object images may overlap one upon the other. In such cases, with the conventional methods described above, it is difficult to make a judgment as to which object image is to be taken as the target of extraction of contour lines. It is also difficult to extract the contour lines of both target object images independently of one another.
With the aforesaid method proposed by Finkel, et al., wherein end points of contour lines are detected, as indicated by the arrows in FIG. 92A, detecting operations are carried out on end points of contour lines, which end points may be located at all positions in all directions in an image from one end point of each of the objects 210A, 10B, 210C, and 210D. Also, the proposed method aims at predicting a contour line in every direction. Therefore, with the proposed method, in cases where a complicated image is given, prediction must be carried out on a wide variety of contour lines, and the contour lines of a target object cannot be accurately predicted and extracted.