Field of the Invention
Aspects of the present disclosure generally relate to image processing and, more particularly, to an image processing apparatus, method, and medium for extracting a feature amount of an image.
Description of the Related Art
Heretofore, a technique has been developed to enable searching for an image of a predetermined object in a recorded moving image and performing playback of a recording period during which the object was subjected to image capturing. To implement this searching and playback, the technique detects an object region at the time of image capturing with a camera, extracts a feature amount of an image from the object region, and stores the feature amount of the object region in association with a moving image. When searching for an image of a predetermined object from among the moving image, the technique searches for an image in which the object associated with the feature amount is shown from the moving image. To enable more accurate searching, a method of acquiring a high-resolution object image to be used to extract a feature amount is effective. In particular, in the field of monitoring cameras, a technique is known to provide, in addition to a camera used to monitor a wide area (wide angle), a camera capable of operating for panning, tilting, and zooming (hereinafter referred to as “PTZ”) and to acquire a high-resolution image of each and every person serving as an object. For example, U.S. Patent Application Publication No. 2005/0104958 discusses a technique to perform zooming when detecting a person, perform tracking on the person with a camera until an image having a good image quality as determined by the degree of focusing or the amount of noise is obtained, and acquire a high-resolution image of the person region. Moreover, Japanese Patent Application Laid-Open No. 2003-219225 discusses a technique to detect a person as a moving object based on a background differencing method, detect a skin color region from the detected person region, perform zooming on the skin color region, and acquire a high-resolution image of the person's face.
However, in the case of an environment in which a large number of persons come and go, such as a convenience store, a shopping mall, and an airport, the number of persons to be targeted is too large to individually discriminate such a large number of persons. In such an environment, performing tracking until high-quality person images are obtained with respect to all of the persons, as in the technique discussed in U.S. Patent Application Publication No. 2005/0104958, or performing zooming on the faces of all of the persons, as in the technique discussed in Japanese Patent Application Laid-Open No. 2003-219225, causes a processing load to become very high. Furthermore, since, usually, the more high-resolution image, the more detailed feature amount (having a more amount of information) is extracted, the amount of data of the extracted feature amount becomes larger and the amount of memory to be consumed also increases. In addition, in the case of an environment such as an airport, since a single camera is not sufficient to cover all of the monitoring target area, a great number of cameras need to be installed. If every camera is used to acquire a high-resolution image of every individual person to extract a feature amount, the processing load and the overall amount of memory used for the entire system would become huge.