1. Field of the Invention
The present invention relates to an object recognition apparatus and an object recognition method, and more particularly, to a technique suitable for recognizing an object from a frame image captured by a camera or the like.
2. Description of the Related Art
Conventionally, there have been discussed techniques for measuring a number of persons, by shooting persons passing through an entrance of a shop or a corridor with a camera, and detecting positions of faces of human objects from a captured image. A technique for counting pedestrians in such a predetermined region from a camera video image is discussed in, for example, Japanese Patent Application Laid-Open No. 4-199487. According to this technique, counting is performed in such a manner that a camera is mounted at top of a corridor facing directly downward, and circular objects from camera images are extracted as human objects, based on the fact that shapes of heads of human objects viewed from above by the camera are circles.
On the other hand, in recent years, practical utilization of techniques for detecting faces from images has progressed, using such techniques as discussed in Rowley et al, “Neural network-based face detection”, IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, No. 1, JANUARY 1998(hereinafter, referred to as NON-Patent Document 1) and Viola and Jones, “Rapid Object Detection using Boosted Cascade of Simple Features”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '01) (hereinafter, referred to as NON-Patent Document 2). With the utilization of such techniques, it is also possible to count human objects by mounting a camera facing human objects in a corridor, for example, and detecting faces from video images captured with a camera.
FIG. 8 illustrates a scene where a human object is captured with a camera, which is mounted facing the human object in a corridor.
In FIG. 8, a human object 103 is passing through the corridor with a ceiling 101 and a floor 102. A camera 104 is mounted on the ceiling 101, so that the human object 103 can be captured from obliquely above. A local area network (LAN) cable 105 transmits a video image captured by the camera 104. A personal computer (PC) 106 is an apparatus that analyzes the video image, and performs counting of human objects.
As illustrated in FIG. 8, it is also possible to count human objects by detecting a face of the human object 103 from the video image captured by the camera 104 mounted on the ceiling 101 using the techniques discussed in the above-described NON-Patent Documents 1 and 2. However, in order to recognize a human object with a high precision, a video image with a high resolution as well as a high frame rate is required. As a result, a load of a network when receiving video data from the camera eventually will become larger. Hereinbelow, descriptions will be given referring to examples illustrated in FIG. 5 and FIG. 6.
As illustrated in FIG. 5, when a human object 504 standing on a floor 502 of a corridor is at a position far from a camera 503 mounted on a ceiling 501 of the corridor, a human object 602 appears small at an upper part within a frame image 601, as illustrated in FIG. 6. Therefore, a video image with a high resolution is required for recognizing a human object that appears small.
On the other hand, when the human object is at a position far from the camera, an angle of the camera with respect to the human object is small. Consequently, it takes a long time for the human object to change the position within the frame image. In other words, a moving speed of the human object within the frame becomes relatively slow. Therefore, even if the frame rate is low, a recognition result will not be significantly varied.
On the other hand, as illustrated in FIG. 5, when the human object 505 standing on the floor 502 of the corridor is at a position near the camera 503, the human object 603 appears large at a lower part within the frame image 601, as illustrated in FIG. 6. Therefore, since the human object appears large, even a video image with relatively low resolution can be recognized. However, when the human object is not far from the camera, an angle of the camera with respect to the human object is large, and a position of the human object within the frame image changes significantly in a short time. Therefore, a video image with a high frame rate is required.
As described above, when a recognition target region 604 for a human object 602 is set, a video image with a high resolution, but a low frame rate can be used for recognition. Conversely, when a recognition target region 605 for a human object 603 is set, a video image with a high frame rate, but a low resolution can be used for recognition. However, to recognize the human object with a high precision without depending on a position of the human object within the frame image, it is necessary to satisfy the both conditions. Therefore, a video image with a high resolution and a high frame rate will be eventually required.
To solve these issues, a method for readily optimizing resolutions and frame rates depending on recognition target regions is needed. As such a method, a method for causing a user to designate a minimum detection size of the recognition target in a portable camera, and accordingly determining a resolution of an input image is discussed in, for example, Japanese Patent Application Laid-Open No. 2007-72606. The technique is effective in a case where a distance between the camera and the subject does not vary.
However, a distance between the camera and the subject varies at all times, in a case of a camera mounted in a corridor or the like and intended for monitoring or the like, as described above, necessary resolution is varied depending on a position of the subject within the frame image. In this case, the user needs to perform settings many times. Further, the entire frame image may be taken as a recognition target region, and an angle and a capturing magnification of the camera may be changed, so that the recognition target region coincides with the frame image region. Even in such a case, the need to reset the frame rate and the resolution will eventually arise, each time the angle of the camera and the capturing magnification are changed.