1. Field of the Invention
The present invention relates to an image recognition apparatus that recognizes a target in an input image, and an image recognition method.
2. Description of the Related Art
Conventionally, there are image recognition apparatuses that recognize a plurality of targets in an input image and associate those targets. Consider the case where a bag carried by a person A is recognized from an input image shown in FIG. 10, for example. In the case of this example, a plurality of targets are the person A and his or her bag. To establish an association between the person A and his or her bag, the face of the person A and the bag are recognized from the input image. If the face of the person A and the bag are recognized, a relationship between those two objects is determined using a certain method so as to recognize the bag carried by the person A.
Examples of such a method for associating two detected targets are disclosed in Japanese Patent Laid-Open No. 2006-202049 (hereinafter referred to as “Patent Document 1”) and Japanese Patent Laid-Open No. 2005-339522 (hereinafter referred to as “Patent Document 2”). With Patent Document 1, a plurality of targets recognized on the same screen are considered as being related to one another and are associated with one another. In an exemplary embodiment of Patent Document 1, a face and a name tag are recognized; if a face and a name tag are recognized on the same screen, they are considered as being related to each other and are associated with each other. Applying this method to the example of recognizing the bag carried by the person A, if the person A and a bag are recognized on the same screen, the bag recognized is associated as a person A's bag. With Patent Document 2, a plurality of recognized targets are associated with one other according to their relative positions. In an exemplary embodiment of Patent Document 2, a face is recognized and an object located above the recognized face is recognized as hair. Applying this method to the case of recognizing the bag carried by the person A, if the person A and a bag have been recognized, a bag located below the face of the person A is associated as a person A's bag.
The above-described concept is based on the presence of an image recognition apparatus that recognizes a target in an input image; such an image recognition apparatus generally has a configuration described below. FIG. 22 illustrates a configuration of such an image recognition apparatus, in which an image input unit 1, a to-be-recognized-target designation unit 2, and a display unit 6 are connected to an image recognition apparatus 10. The image recognition apparatus 10 includes a target recognition unit 3, a parameter selection unit 4, and a recognition-parameter storage unit 5.
The operation of a general image recognition apparatus with such a configuration when recognizing a target will now be described.
The recognition-parameter storage unit 5 stores information that is used in processing performed by the target recognition unit 3. This information varies depending on the algorithm used in the target recognition unit 3; for example, if the target recognition unit 3 uses an algorithm based on a neural network, the recognition parameter is a synaptic weight value on the neural network. The parameter selection unit 4 selects a necessary recognition parameter that varies depending on a target to be recognized and transfers the necessary parameter to the target recognition unit 3. The target recognition unit 3 recognizes a target from an image input from the image input unit 1, using the parameter received from the parameter selection unit 4. The display unit 6 displays the result of processing performed by the target recognition unit 3, specifically, a region of a recognized target in the image, the number of targets, and so on.
With the method described in Patent Document 1 in which a plurality of targets recognized on the same screen are considered as being related to one another, in the case of an input image as shown in FIG. 10, establishing proper associations is difficult because the bag carried by a person B is also associated as a person A's bag.
With the method described in Patent Document 2 in which a plurality of targets are associated according to their relative positions, in the case of an input image as shown in FIG. 10, a bag located below a hand is associated with the hand according to the relative positions of the hand and the bag. Thus, again, the bag carried by the person B is associated as a person A's bag and proper associations cannot be established.
The following two factors are considered as reasons for such a failure in target recognition processing performed for the association of targets by conventional image recognition apparatuses.
The first factor is the case where a recognition target is unknown to an image recognition apparatus. Recognition parameters are generated using information regarding only known recognition targets, and a target recognition unit performs recognition processing using a recognition parameter generated with respect to a known recognition target. This implies that the target recognition unit cannot obtain information on an unknown recognition target and thus cannot recognize an unknown recognition target. To solve this problem, there is a suggestion that recognition parameters be generated in advance from all sorts of known recognition targets; however, in reality, preparing all sorts of recognition targets will be difficult. Such cases include a case where there are an infinite number of variations of recognition targets, and a case where a new sort of recognition targets appears at frequent intervals. One example of such cases is bags. There are all colors, shapes, and sizes of different bags in the world, and still, a day hardly goes by without new bags being put on the market, so that the variety of bags is increasing day by day.
The second factor is the case where an input image is in unfavorable conditions. Examples of such an image in unfavorable conditions include the case where a recognition target is inclined more than a permissible level and the case where a part of a recognition target is hidden.