In some situations, it may be desirable to organize video data into sets of events with associated temporal dependencies. For example, a soccer goal could be explained using a vocabulary of events such as passing, dribbling, and tackling. In describing dependencies between events, it is natural to invoke the concept of causality: A foot striking a ball causes its motion. Likewise, a car accident causes a traffic jam.
There is a rich literature in psychology and cognitive science on event perception and causality. Within the vision literature, there have been several attempts to develop models of causality that are suitable for video analysis. A representative example is the work of Mann, Jepson, and Siskind on event analysis using the Newtonian laws of motion. When domain models are available, as in the case of physics or sporting events, causal relations can be expressed in terms of events with “high-level” semantic meaning. However, connecting these models to pixel data remains challenging. Moreover, the lack of general domain theories makes it difficult to apply this approach to general video content. Thus, domain model video analysis is unsuitable to analyze unstructured video content, such as content that is untagged or otherwise undefined.