Automated detection and recognition of events in video is desirable for video surveillance, video search, automated performance evaluation and advanced after-action-review (AAR) for training, and autonomous robotics. A class of events known as “multi-agent” events involves interactions among multiple entities (e.g., people and vehicles) over space and time. Multi-agent events may generally be inferred from the participating object types, object tracks and inter-object relationships observed within the context of the environment. Some simple example of multi-agent events include vehicles traveling as a convoy, people walking together as a group, meetings in a parking lot, etc. Although these simple events demonstrate the concept multi-agent events, it is desirable for a system to recognize more complex multi-agent events. Examples of more complex multi-agent events include the arrival/departure of a VIP with security detail, loading and unloading with guards or in the presence of unrelated people, meetings led by a few individuals, and coordinated team actions such as sports plays and military exercises.
Recent work has addressed modeling, analysis and recognition of complex events. The descriptors used in such approaches include object states, such as start, stop, move or turn, interactions among entities that are typically instantaneous, and pair-wise measures such as relative distance or relative speed between an entity and a reference point or a reference entity. The pair-wise measurements sometimes are quantized into Boolean functions such as Approaches, Meets/Collides, etc.
Unfortunately, such approaches in the prior art rely on a pre-defined ontology of an event and fixed numbers and types of objects that participate in an event. Moreover, although pair-wise measurements are effective for events involving a small number of entities, they are inadequate and inefficient for representing and analyzing complex configurations of multiple entities that interact with each other simultaneously. For example, with the same relative distance and the same relative speed, two people can walk together or one follows the other, which indicates different relationships among the two people.
Accordingly, what would be desirable, but has not yet been provided, is a system and method for effectively and automatically capturing complex interactions amongst multiple entities including context over space and time.