Detection of actions or activities of various entities (e.g., humans, robots, animals, or other moving objects) has many useful applications, including surveillance, health care, human-computer interaction, intelligent robot navigation, computer games, and so on. Typically, an action classifier (model) is trained on videos related to one or more known actions. Once trained, the model may be used to process an incoming video to determine whether a particular action takes place in this video. Despite efforts over many years, effective detection of actions of entities continues to be a challenging task.