Field of the Invention
Aspects of the present invention relate to an image processing apparatus and an image processing method, and particularly relates to an image processing apparatus and an image processing method for estimating a three-dimensional position of an object in an image captured by an image capturing apparatus.
Description of the Related Art
Recently, there has been a rapid spread of monitoring cameras installed in stores for security. There is a proposal to use such a monitoring camera for not only acquiring a video image, but also for using the video image for the purpose of a marketing research of a store by detecting a person in the video image to measure a level of congestion or to analyze a flow of the person. In addition, for the marketing purpose, there is a demand for analyzing the motion of a person caught by a monitoring camera in order to analyze, from the analyzed motion of the person, a behavior with interest such as picking up of a commodity in a store with the hand.
In a case where the motion analysis is performed, useful information cannot be obtained without a sufficient accuracy of detecting a region of a person. In Japanese Patent Laid-Open No. 2012-155391, a method is proposed in which a plurality of regions of a person in an image is detected to estimate an orientation of the person. However, there is a following situation: although a distinctive region such as a face is relatively easily detected, regions having a simple shape such as a torso and limbs cannot be detected accurately since it is difficult to distinguish the regions from other objects appearing in the background.
Regarding this situation, Japanese Patent Laid-Open No. 2008-84141 proposes a method in which the head and a hand of a person are detected in a three-dimensional image captured by a 3D camera to recognize the motion thereof.
In addition, in Japanese Patent Laid-Open No. 2010-67079, a method is proposed in which a person area included in a two-dimensional distance data is recognized to analyze a behavior of a person. The conventional examples described above perform three-dimensional recognition in order to improve the accuracy. However, an enormous amount of processes may be used for acquiring a range image.
Furthermore, as in Japanese Patent Laid-Open No. 2009-143722, a method is proposed in which a three-dimensional position of an individual person is estimated by a stereoscopic vision of the detected individual person. In this example, the three-dimensional position is estimated by integrating detection results of a head detected in images captured by a plurality of cameras, and thereby a movement locus of the person is analyzed. However, the motion of a person cannot be analyzed only by a position of the head of the person.
It is possible, by extending the above-described method, to integrate results of a plurality of regions detected in images captured by a plurality of cameras to estimate three-dimensional positions thereof and to perform a motion analysis. However, the regions for the motion analysis are to be captured by the plurality of cameras.
For example, when such an analysis process is performed by using a plurality of monitoring cameras installed in a store, it is easy to capture a region such as a head by the plurality of cameras. However, regions such as limbs are likely to be hidden, and therefore it has been difficult to estimate the three-dimensional positions thereof.