The number of surveillance cameras monitoring public places is growing worldwide. For example, the United Kingdom has installed more than four million security cameras over the past decade ending in 2012. In New York City, U.S.A., the number of operating cameras has grown rapidly. Such systems may provide more comprehensive coverage of public areas relative to relying on the limited comprehension of on-scene human monitors, enabling public safety personnel monitoring the cameras to more quickly spot (in real time) and abate threats to public safety. Video surveillance may enable personnel to monitor from one location a wide variety of other locations remote from the observer's location, for example to monitor a plurality of bridges for deteriorating structures, streets for speeding automobiles, structures for fires, public assembly areas for abandoned packages that fit explosive device activity profiles, etc. Thus, one person can monitor a limitless number of different areas without the temporal limitations of the need to be physically present in each area, greatly expanding the capabilities of the monitor.
However, the capabilities of such systems may be limited by reliance on human perception to review the video feeds and make the necessary determinations to spot and abate problems. The number of personnel available to watch video footage from vast camera arrays is generally limited by budgetary and other resource limitations, as is the ability of any one human monitor to perceive a threat in a given video feed. The process of watching surveillance videos is resource consuming, suffers from high costs of employing security personnel, and efficiency in such systems to detect events of interest is also limited by the constraints of human comprehension.
The field of intelligent visual surveillance seeks to address this problem by applying computer vision techniques to automatically detect specific events in video streams. Such systems may enable automatic object discernment and retrieval based on visual attributes from surveillance videos, generally by focusing on a limited universe of objects of interest, such as stationary packages as distinguished from non-static objects, vehicles as distinguished from pedestrians and stationary structures, etc. However, the efficacy of such systems in real-world conditions may be limited, and high rates of false positive detections or low rates of accuracy in detecting true events may limit the usefulness and trustworthiness of such systems.