Video surveillance cameras are used extensively in airports, rails stations, casinos, and other secure or monitored areas. Often times, a human operator is required to view and track video streams from multiple video cameras. For example, the operator may monitor several video streams, often cycling through a number of video streams and spending more time viewing video streams of more important areas being monitored.
In crowded environments with numerous cameras, an operator is often unable to aggregate information collected by a large number of video cameras to assess threats associated with suspicious behaviors. Also, due to the large number of video cameras deployed, the operator is often unable to track a suspicious person or other object across a large facility or other area. In addition, when a forensics analysis is conducted, tracking the whereabouts of a suspicious person or other object typically requires the operator to sift through a large number of video streams.
In order to perform fast searches of images, some systems generate a tree-based data structure, where similar images or similar objects and activities are grouped together under the same node of a tree. However, this type of approach is typically not effective when the appearances of different objects and activities are difficult to differentiate. In these circumstances, it is difficult for the systems to properly select the appropriate node in the tree for a given object or activity.