Different people with similar behaviors induce completely different space-time intensity patterns in a recorded video sequence. This is because they wear different clothes and their surrounding backgrounds are different. What is common across such sequences of same behaviors is the underlying induced motion fields. Efros et. al (in A. A. Efros, A. C. Berg, G. Mori and J. Malik. Recognizing action at a distance. ICCV, October 2003) employed this observation by using low-pass filtered optical-flow fields (between pairs of frames) for action recognition.
However, dense unconstrained and non-rigid motion estimation is highly noisy and unreliable. Clothes worn by different people performing the same action often have very different spatial properties (different color, texture, etc.) Uniform-colored clothes induce local aperture effects, especially when the observed acting person is large (which is why Efros et. al analyzed small people, “at a glance”). Dense flow estimation is even more unreliable when the dynamic event contains unstructured objects, like running water, flickering fire, etc.
Prior art methods for action-recognition in video sequences are limited in a variety of ways. The methods proposed by Bobick et. al (A. Bobick and J. Davis. The recognition of human movement using temporal templates. PAMI, 23(3):257-267, 2001) and Sullivan et. al (J. Sullivan and S. Carlsson. Recognizing and tracking human action. In ECCV, 2002) require prior foreground/background segmentation. The methods proposed by Yacoob et. al (Y. Yacoob and J. J. Black. Parametrized modeling and recognition of activities. CVIU, 73(2):232-247, 1999), Black (M. J. Black. Explaining optical flow events with parameterized spatio-temporal models, in CVPR, 1999), Bregler (C. Bregler. Learning and recognizing human dynamics in video sequences. CVPR, June 1997), Chomat et. al (O. Chomat and J. L. Crowley. Probabilistic sensor for the perception of activities. ECCV, 2000), and Bobick et. al require prior modeling or learning of activities, and are therefore restricted to a small set of predefined activities. The methods proposed by Efros et. al, Yacoob et. al, and Black require explicit motion estimation or tracking, which entail the fundamental hurdles of optical flow estimation (aperture problems, singularities, etc.)