1. Field of the Invention
Embodiments of the invention provide techniques for computationally analyzing a sequence of video frames. More specifically, embodiments of the invention relate to techniques for a mapper component to analyze the sequence of video frames using multiple adaptive resonance theory (ART) networks.
2. Description of the Related Art
Some currently available video surveillance systems provide simple object recognition capabilities. For example, a video surveillance system may be configured to classify a group of pixels (referred to as a “blob”) in a given frame as being a particular object (e.g., a person or vehicle). Once identified, a “blob” may be tracked from frame-to-frame in order to follow the “blob” moving through the scene over time, e.g., a person walking across the field of vision of a video surveillance camera. Further, such systems may be configured to determine when an object has engaged in certain predefined behaviors.
However, such surveillance systems typically require that the objects and/or behaviors which may be recognized by the system to be defined in advance. Thus, in practice, these systems rely on predefined definitions for objects and/or behaviors to evaluate a video sequence. In other words, unless the underlying system includes a description for a particular object or behavior, the system is generally incapable of recognizing that behavior (or at least instances of the pattern describing the particular object or behavior). Thus, what is “normal” or “abnormal” behavior needs to be defined in advance, and separate software products need to be developed to recognize additional objects or behaviors. This results in surveillance systems with recognition capabilities that are labor intensive and prohibitively costly to maintain or adapt for different specialized applications. Accordingly, currently available video surveillance systems are typically unable to recognize new patterns of behavior that may emerge in a given scene or recognize changes in existing patterns. More generally, such systems are often unable to identify objects, events, behaviors, or patterns as being “normal” or “abnormal” by observing what happens in the scene over time; instead, such systems rely on static patterns defined in advance.
Further, the static patterns recognized by available video surveillance systems are frequently either under inclusive (i.e., the pattern is too specific to recognize many instances of a given object or behavior) or over inclusive (i.e., the pattern is general enough to trigger many false positives). In some cases, the sensitivity of may be adjusted to help improve the recognition process, however, this approach fundamentally relies on the ability of the system to recognize predefined patterns for objects and behavior. As a result, by restricting the range of objects that a system may recognize using a predefined set of patterns, many available video surveillance systems have been of limited (on simply highly specialized) usefulness.