Human action recognition becomes increasingly important in a wide range of video surveillance, sport analysis, and video digest generation applications. For example, detecting suspicious human activities in surveillance videos has the potential to predict terrorist attacks and other potential threats to human safety.
Known action recognition methods train a classifier for the actions of interest based on visual features, using a given training set containing samples of the action of interest. The trained classifier is then applied to fixed length and often overlapping temporal segments of a new (unseen) video and the actions of interest are recognised by processing the classification scores. However recognising interactions only from the visual differences in the way actions are performed can be very challenging, as sometimes there are subtle visual differences in the way a person performs an action depending on their intention. For example a soccer player may kick the ball differently when his intention is to pass the ball to another player compared to the situation where his intention is to shoot a goal with the aim of scoring a point for his team. To alleviate the issues with using visual features, features derived from the tracking outputs of humans and objects occurring in video segments may also be employed to train classifiers. These tracking based features are expected to provide additional information about the human activities in those video segments. For example, when a soccer player is shooting a goal, quite often the soccer ball travels at a higher speed than it does in other situations. In this case, an increase of the ball velocity may be observed. Likewise, when a soccer ball is passed between two nearby soccer players, the observed ball velocity is anticipated to be low. Nevertheless, recognising actions is still challenging as high-quality tracking is difficult to guarantee. Human and object tracking is often limited by environmental factors such as location of the cameras, resolution of cameras, and occlusions amongst humans.