Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a storage medium.
Description of the Related Art
In recent years, there are increasing needs to recognize an image recorded by a network camera and utilize additional information acquired from the recognition result for system services. Examples of use cases include retrieval of a specific person, estimation of attributes (gender, age, and the like) of a subject, and displaying of a moving locus of a human body. A known technique concerning image processing and feature detection or recognition necessary for achieving such a system is described in a comprehensive manner in “Computer Vision: Algorithms and Applications” by Richard Szeliski, published by Kyoritsu Shuppan Co., Ltd., March, 2013. In Japanese Patent Application Laid-Open No. 2002-373332, there is disclosed a method of detecting an object from a motion vector, estimating a retrieving position in a next image based on the detection result, and tracking the object by template matching.
There is even a case where not only image recognition for an image acquired in real time but also recognition re-processing for a past recorded image needs to be carried out. In Japanese Patent No. 5193944, there is disclosed a technique for retrieving a newly registered person from a past recorded image by batch processing. The number of past recorded images greatly increases depending on the frame rate and size of an image to be captured or a processing period of time. Thus, in the technique disclosed in Japanese Patent No. 5193944, a change amount between images is calculated in advance in real-time processing, and recognition processing is carried out in re-processing only for a region where a change of a predetermined amount or more is recognized.
Not only in the system for retrieving the human face but also in other image recognition systems, re-processing of the recorded image is often carried out. For example, in the case of a system for retrieving/identifying the person by using attribute information such as the body height, the gender, or clothes, the attribute information to be extracted from one photographing target is diverse. Image recognition processing of several attributes to be extracted is carried out for one region. This may create a case where all pieces of attribute information for all subjects cannot be extracted in real time. In such a case, a necessary attribute needs to be detected for each past recorded image if necessary.
When the image recognition system is built, a plurality of network cameras are generally arranged without blind angles. If image recognition processing is carried out by using a personal computer (PC), it is desired to deal with more network cameras with a smaller number of apparatus.
No matter how high an image recognition processing speed is, when the re-processing is carried out for the past recorded image, it is not easy to prevent a processing amount/processing time from being much greater. For example, when 10 images are recorded per second, 600 images are acquired per minute and 36,000 images are acquired per hour. When the number of network cameras is 10, 3.6 million images need to be processed even in the case of re-processing for images of the past hour.
When there are a plurality of image recognition results to be acquired as in the case of the attributes such as the body height, the gender, and clothes, processing time is longer. For example, when 20 attributes are detected for one person, assuming that 50 milliseconds are needed for detecting one attribute, 1 second is necessary for completing recognition processing of one person.