Human posture analysis is one of the most important steps towards successful analysis of information representing human behavior contained in a video file. The difficulty of human posture analysis is twofold. First, the movement of a human body is represented by an articulated motion in the digitized video file. Therefore, to define a “key posture” in a digital image is a problem with high dimensionality and complexity. Second, characterization of human behavior is equivalent to dealing with a sequence of video frames that contain both spatial and temporal information. The most challenging issue is how to properly characterize spatial-temporal information and then facilitate subsequent comparison/retrieval tasks.
The posture analysis systems in the conventional art can be categorized into two classes, i.e., the 2-dimensional based and 3-dimensional based approaches. Amongst the 2-dimensional approaches, Haritaoglu et al. proposed a W4 (what, where, when and who) system that computed the vertical and horizontal projections of a silhouette to determine the global posture of a person, such as standing, sitting, bending and lying. See I. Haritaoglu, D. Harwood, and L. Davis, “Ghost: A Human Body Part Labeling System Using Silhouettes,” in Proc. Int. Conf. Pattern Recognition, Vol. 1, pp. 77-82, 1998.
Bobick and Davis proposed a temporal template built by stacking a set of consecutive frames. The proposed temporal template characterized human motion by using motion energy images (MEI) and motion intensity images (MHI). Moment based features were extracted from MEI and MHI and they used these moment based features to conduct template matching. See A. F. Bobick and J. W. Davis, “The Recognition of Human Movement Using Temporal Templates,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 23, no 3, March 2001.
Among the 3-dimensional approaches, Boulay et al. first computed projections of moving pixels on a reference axis and learned 2-D posture appearances through PCA (principal component analysis). Then, they employed a 3-D model of posture to make the projection-based method independent of the camera position. See B. Boulay, F. Bremond, and M. Thonnat, “Human Posture Recognition in Video Sequence,” in Proc. IEEE Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 23-29, 2003.
Zhao et al. used a 3-D human model to verify whether a moving region detected represents a person or not. The verification process was done by walking recognition using an articulated human walking model. See T. Zhao, R. Nevatia and F. Lu, “Segmentation and Tracking of Multiple Humans in Complex Situations,” in Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Vol. 2, pp. 08-12, 2001. However, due to the complexity in computation and high costs of the 3-D approach, there is still no 3-D key posture analysis system commercially available.
In order to provide an automatic and effective key posture analysis system for digitalized images, it is necessary to identify the significant postures of a human behavior recorded in a video sequence systematically and automatically. However, in the previous researches no such automatic key posture analysis and selection methods were disclosed.