In some conventional intelligent video surveillance systems, a first set of video processing algorithms is used to detect, track, and classify objects while a second set of algorithms is used to infer the events based on the temporal and spatial information of the objects and preset rules. In these systems, the algorithms developed have very limited ability to adapt to new and unseen environments. In the past, auto-adaptation techniques that were available did not adapt well to active surveillance systems. For example, background modeling algorithms in the systems made adaptations when the background changed. However, the goal of these adaptations was only to determine the change in the scene. The definition of the change was fixed and non-adapting. Other types of auto-adaptation techniques such as ground plane calibration did not improve the intelligence of the system over time, as they did not accumulate knowledge.
Conventional intelligent video surveillance systems were typically trained on a limited data set. For some of these conventional video surveillance systems, scalability problems would arise due to the difficulty in obtaining or generating a complete data set that represents all the conditions that could be found. Even if creating such a complete data set was possible, it would be difficult for the systems to process and learn this complete data set. The complexity of such systems would be difficult to build and deploy. Thus, conventional video surveillance systems have difficulties detecting and classifying objects of interests where the systems generate false detections when processing new conditions in new environments not seen before.