There has been a surge, in recent years, towards the study of human action recognition because it is fundamental to many computer vision applications such as video surveillance, human-computer interface, and content-based video retrieval. While the human brain can recognize an action in a seemingly effortless fashion, recognition solutions using computers have, in many cases, proved to be immensely difficult.
One challenge is the choice of optimal representations for human actions. Ideally, the representation should be robust against inter- or intra-variations, noises, temporal variations, and sufficiently rich to differentiate a large number of possible actions. Practically, such representations do not exist.
It is well documented that human actions can be encoded as spatial information of body poses and dynamic information of body motions. However, some actions cannot be distinguished solely using shape and/or motion features. For example, a skip action may look very similar to a run action if only the pose of the body is observed.
The classification task would be simplified if the motion flow of the entire body is considered simultaneously. Using this approach, one would expect that the skip action generates more vertical flows (upward and downward flows) than the run action. In addition, actions such as jogging, walking and running can be easily confused if only the pose information is used due to the similarity of postures in the action sequences.
Likewise, there are some actions which cannot be fully described by motion feature alone. Combining both motion and shape cues potentially provides complementary information about an action. Thus, conventionally, motion and shape feature vectors are concatenated to form a super vector. However, the super vector obtained through such concatenation may not explicitly convey the underlying action. Moreover, the super vector is unnecessarily long and requires complex feature dimension reduction techniques.
Thus, what is needed is a system and method for efficient recognition of human motion. Furthermore, other desirable features and characteristics will become apparent from the subsequent detailed description and the appended claims, taken in conjunction with the accompanying drawings and this background of the disclosure.