1. Field of the Invention
Embodiments of the invention provide techniques for discovering object type clusters using pixel-level micro-features extracted from image data. More specifically, embodiments of the invention relate to techniques for producing and updating a self-organizing map and adaptive resonance theory (SOM-ART) network that is used to classify objects depicted in the image data based on the pixel-level micro-features.
2. Description of the Related Art
Some currently available video surveillance systems provide simple object recognition capabilities. For example, a video surveillance system may be configured to classify a group of pixels (referred to as a “blob”) in a given frame as being a particular object (e.g., a person or vehicle). Once identified, a “blob” may be tracked from frame-to-frame in order to follow the “blob” moving through the scene over time, e.g., a person walking across the field of vision of a video surveillance camera. Further, such systems may be configured to determine the type of object that the “blob” depicts.
However, such surveillance systems typically require that the objects which may be recognized by the system to be defined in advance. Thus, in practice, these systems rely on predefined definitions for objects to evaluate a video sequence. In other words, unless the underlying system includes a description for a particular object, i.e., has been trained, the system is generally incapable of recognizing that type of object. This results in surveillance systems with recognition capabilities that are labor intensive and prohibitively costly to maintain or adapt for different specialized applications. Accordingly, currently available video surveillance systems are often unable to identify objects, events, behaviors, or patterns as being “normal” or “abnormal” by observing what happens in the scene over time; instead, such systems rely on static object definitions.
Further, the static patterns recognized by available video surveillance systems are frequently either under inclusive (i.e., the pattern is too specific to recognize many instances of a given object) or over inclusive (i.e., the pattern is general enough to trigger many false positives). In some cases, the sensitivity of may be adjusted to help improve the recognition process, however, this approach fundamentally relies on the ability of the system to recognize predefined patterns for objects. As a result, by restricting the range of objects that a system may recognize using a predefined set of patterns, many available video surveillance systems have been of limited (on simply highly specialized) usefulness.