In recent years, as digital image devices such as digital cameras have rapidly prevailed, an expectation for generic object recognition for recognizing what objects are contained in a taken picture image and video is increasing. The generic object recognition has a possible use for various applications such as proper classification of image data stored in a database without being classified, search for necessary image data, further, extraction of a desired scene from a motion image, and cutting out only desired scenes to re-edit then.
As technologies for the object recognition, various recognition technologies such as face recognition and fingerprint recognition have been developed, and all of them are directed to specific applications. If such a recognition technology specialized in a certain single application is used to another application, there arise such a problem that a recognition rate immediately decreases and the like. Therefore, development of technologies for recognizing a generic object is expected.
In order to recognize a generic object, it is necessary to extract feature amounts of an image subject to recognition. As methods for extracting feature amounts, methods of using geometrical features contained in an image, which are described in Patent Literature 1 and Patent Literature 2, are widely known. However, most of those feature amounts cannot be calculated unless parameters such as thresholds are set in advance based on statistical learning or experience of a user. A method which requires the statistical learning and the experience of a user cannot calculate a feature amount for an image which has not been learned, and poses such a problem that an erroneous recognition result is provided.
As a method to calculate a feature amount without necessity of the statistical learning or the experience of a user, a method described in Non Patent Literature 1 and called Scale Invariant Feature Transform (SIFT), which uses a histogram accumulating a local intensity gradient of an image, is widely recognized. By using this technology, the same images including geometric transformations and occlusions can be recognized as being the same. However, this technology is intended to determine whether or not two images are the same, and cannot provide information on to what degree two similar image are similar.
Moreover, recognition using a representation method described in Non Patent Literature 2 and called Curvature Scale Space (CSS), which involves smoothing an image contour stepwise and representing the image by using positions of inflection points of the counter at each step, is also well known. It is known that position information on the inflection points used in this technology has a very similar appearance pattern for the same images or similar images. Thus, by using this technology, images same or similar in contour can be recognized, or images obtained by applying a geometric transformation to the images can be recognized as the same or similar images. However, this technology does not use information on points other than the inflection points at all, but uses only very limited information out of information on the contours. Therefore, for images similar in contour to each other, if the pieces of position information on inflection points are different from each other, it may be determined that the images are not “similar” to each other. Moreover, even for images dissimilar in contour to each other, if pieces of position information on inflection points are relatively similar to each other, it may be determined that the images are “similar” to each other. In other words, this technology cannot calculate a degree of similarity based on features in contour.
On the other hand, a method of recognition based on curvature information on respective points on a contour is also proposed (Patent Literature 3). This technology uses the curvature information on all points on the contour, and can calculate a degree of similarity for contours slightly different in contour shape. However, this technology assumes comparison in contour for an entire periphery of an outline of a shape. As a result, if a contour is disconnected halfway or a part of an object shape overlaps another object shape in an image, this technology cannot be used.