Human posture estimation based on image data from a captured video sequence has been an active area of research in recent years. This is because being able to determine human behavior based on videos through computer analysis would make behavior analysis, which is performed in various fields, possible without requiring human effort. Examples of behavior analysis include abnormal behavior detection on the streets, purchasing behavior analysis in stores, factory streamlining support, and form coaching in sports.
In this respect, PL 1 and NPL 1, for example, disclose a technique for estimating the posture of a person based on image data captured with a monocular camera.
In the technique disclosed in PL 1 and NPL 1 (hereinafter referred to as “the related art technique”), a silhouette of a model image (a model silhouette) is prepared on a per posture basis. The related art technique then estimates that the posture of the silhouette that is most similar to the silhouette extracted from the captured image (observation target silhouette) is the posture of the subject included in the captured image. Specifically, the related art technique computes a silhouette distance based on the per-pixel exclusive or's of the model silhouette and the observed silhouette, and determines the degree of similarity to be high if the silhouette distance is small.
However, even for the same posture, the outline portion of a silhouette may vary significantly in terms of position and angle. As such, in computing silhouette distances, the related art technique assigns greater weights to the logical or's of the pixels in accordance with how close the pixels are to the center of the observed silhouette. Thus, the related art technique enables posture estimation that is robust against noise (variability) in the outline portion.