In recent years, cameras for capturing subject images, such as DSC (Digital Still Cameras), mobile telephones equipped with a camera, and digital movie cameras, have been widely prevalent. Furthermore, recording media for saving image data have been increasingly larger in size. This enables individual users to keep a large number of AV (Audio Video) contents, such as images or moving images. However, the users are forced to spend a significant amount of time and effort to find an image or a moving image as desired, from a large number of images and moving images.
One conventional technique to help the users to efficiently find a desired image is an image indexing technique for automatically tagging images to organize the images.
There are various methods provided as the image indexing technique for automatically tagging images. For example, tagging is performed by: estimating an event based on time information and place information; detecting a specific object with use of a face detection technique; or detecting similar images based on similarity in color information or texture information. Tags corresponding to images are used when searching the images. However, images captured in various places include different objects and scenes. Accordingly, there has been proposed an image indexing technique for recognizing or categorizing general objects.
According to a conventional technique for recognizing general objects, a model is created for an object in an image, based on (i) a basic feature amount in the image, such as a brightness value, and (ii) a group of local feature amounts. Then, feature amounts detected from an image are compared with the feature amounts of the model to determine whether the feature amounts match those of the model. This technique for recognizing general objects is generally used in many computer vision applications. Another known technique is to provide a device for generating feature vectors each representing an input image. The device processes the feature vectors with use of different classifiers, and automatically categorizes the input images based on a combination of resultant data pieces output from the classifiers. In this way, a large number of images are recognized accurately and at high speed, compared to conventional technologies (see Patent Literature 1, for example). This method enables calculating a feature of an object at high speed from various perspectives.
Yet another known technique is to search for an object by automatically learning a hierarchical object recognition model of the object, focusing on the fact that the object moves and changes variously. The hierarchical object recognition model is constituted by a plurality of parts of the object which are mutually movable based on each other's movements, with use of an arbitrary method (see Patent Literature 2).