Field of the Invention
The present invention relates to a technique of retrieving a specific object from an image.
Description of the Related Art
In recent years, an enormous number of monitoring cameras have been introduced for the purpose of monitoring persons. There have been proposed many systems for supporting the operations of the monitoring cameras. In particular, retrieving a specific person from many monitoring camera videos is one of important applications.
To retrieve a specific person from videos of a large-scale monitoring camera system, the following scenario is assumed. That is, based on information about where and when the retrieval target person existed, cameras and times are narrowed down, and human videos are retrieved from past videos. In addition, the current location of the retrieval target person is retrieved from many camera videos. In practice, however, it is difficult to quickly retrieve the human videos from many camera videos. If the retrieval takes a long time, the retrieval target person moves. Hence, an application for automatically retrieving a similar person using a human video retrieved from past videos as a query is important.
For example, assume that the human video obtained as a query includes a person in red. In that case, a method is considerable that detects a human region from each frame of a monitoring video, and a color feature is acquired from the clothing portion of the detected human region and compared with the query, thereby obtaining retrieval candidates. A method of detecting a human region from a video is disclosed in, for example, Q. Zhu, S. Avidan, M. C. Yeh, and K. T. Cheng, “Fast Human Detection Using a Cascade of Histograms of Oriented Gradients”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006. According to this method, many detection windows extracted from an input image are collated with dictionary data learned in advance using an enormous number of human images, thereby implementing accurate human region detection. In addition, a Histogram of Oriented Gradients (to be referred to as HOG hereinafter) feature amount effective to detect a person is obtained using an integrated image, and a cascade discriminator obtained by AdaBoost learning is applied, thereby implementing speedup. The cascade discriminator is a method of efficiently narrowing down the detection target by connecting a plurality of discriminators in series.
However, when retrieving a person in red, it is not efficient to detect unnecessary persons (here, persons who are not in red) in human region detection. The appearance of a person changes depending on the clothing and orientation, or various shooting situations and scenes. To narrow down such varying human images using the cascade discriminator, the arrangement of the cascade discriminator becomes complex more than necessary. Additionally, in the human region detection method of Q. Zhu, S. Avidan, M. C. Yeh, and K. T. Cheng, “Fast Human Detection Using a Cascade of Histograms of Oriented Gradients”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006, a detection error occurs in the background portion other than the human region.