Human action recognition from video is an area of immense importance to visual surveillance, video indexing, and several other computer-vision domains. Despite of extensive research, fueled by the ongoing advancements in object recognition, the gap between the current capabilities and the applications' needs remains large.
Indeed, action recognition is challenging due to substantial variations in the video data that are caused by varying factors which include viewpoint and scale, clothing and the subject's appearance, personal style and action length, self-occlusion, multiple video objects, and background clutter.
Beyond recognition accuracy, there are other constraints on the design of action recognition methods. Ideally for several applications, such methods would work efficiently in an online manner, and require simultaneous detection of action at several possible time scales (different action lengths) and for every possible starting point.