Automated recognition of human actions in video clips has many useful applications, including surveillance, health care, human computer interaction, computer games, and telepresence. In general, a trained action model (classifier) processes the video clips to determine whether a particular action takes place.
One typical situation is that action models are trained on video data with a clean background, such as by a single person performing the action of interest with little or no movement in the background. Once trained and used for classifying actual video, accurate action recognition is difficult when the video clip being processed has a cluttered and moving background, that is, when the motion field in an action region is contaminated by background motions. This is common in actual video clips, and thus it is desirable to more accurately recognize human actions in dynamic and/or crowded environments, while still being able to use action models trained on video data with a clean background.