1. Field of the Invention
Embodiments of the invention provide techniques for analyzing a sequence of video frames. More specifically, embodiments of the invention provide techniques for learning of temporal anomalies in an unconstrained scene.
2. Description of the Related Art
Some currently available video surveillance systems provide simple object recognition capabilities. For example, a video surveillance system may be configured to distinguish between scene foreground (active elements) and scene background (static elements) depicted in a video stream. A group of pixels (referred to as a “blob”) depicting scene foreground may be identified as an active agent in the scene. Once identified, a “blob” may be tracked from frame-to-frame, allowing the system to follow and observe the “blob” moving through the scene over time, e.g., a set of pixels believed to depict a person walking across the field of vision of a video surveillance camera may be identified and tracked from frame-to-frame.
Some such systems may also classify a blob as being a particular agent (e.g., a person or a vehicle). Further, such systems may be configured to determine when an object has engaged in certain predefined behaviors. Typically, such systems determine when an observed behavior (as represented by changes in pixel color values over some number of frames) matches a pre-defined definition or pattern. For example, such a system may be configured to issue an alert whenever pixels believed to depict a vehicle are observed driving the wrong direction down a one-way street (based on changes in spatial position over multiple frames). Similarly, such systems may allow a user to specify a virtual “trip-wire” where a region of a scene represents a predefined area where activity may be deemed to be unusual. For example, consider a camera used to monitor a subway platform, in such a case, a user could configure the system to generate an alert anytime a foreground blob believed to depict a person is detected in a zone specified as being the tracks for subway trains (i.e., when a person is walking on the train tracks).
However, such surveillance systems typically require that the objects and/or behaviors which may be recognized by the system to be defined in advance or at least require the user to specify zones where activity should result in an alert. Thus, in practice, these systems rely on predefined definitions for objects and/or behaviors to evaluate a video sequence. In other words, unless the underlying system includes a description for a particular object or behavior, the system is generally incapable of recognizing that behavior (or at least instances of the pattern describing the particular object or behavior). Thus, what is “normal” or “anomalous” is defined in advance and separate software products are required to recognize additional objects or behaviors. This results in video surveillance systems with recognition capabilities that are labor intensive and prohibitively costly to maintain or adapt for different specialized applications.
Thus, currently available video surveillance systems are typically unable to recognize new patterns of behavior that may emerge in a given scene or recognize changes in existing patterns. More generally, such systems are often unable to identify objects, events, behaviors, or patterns (or classify such objects, events, behaviors, etc., as being normal or anomalous) by observing what happens in the scene over time; instead, such systems rely on static patterns defined in advance.