Video surveillance systems are used to monitor public places, streets, buildings, cities and other premises or surroundings and comprise a multitude of cameras which monitor relevant points in the surroundings. One of the problems of surveillance systems is, that the multitude of cameras produce a large amount of video data to be monitored, which requires many human observers resulting in high personnel costs.
A possible solution for reducing the costs is the use of video content analysis (VCA) systems, which detect and track objects over time in the video data. Such a VCA system is for example proposed in the scientific paper from A. Hunter, J. Owens and M. Carpenter: A neural system for automated CCTV surveillance, 2003, Proceedings of the IEEE Symposium on Intelligent Distributed Surveillance Systems, Savoy Place, London, February 2003.
The article describes a system for automated identification of suspicious pedestrian activity in a car-park. The system is based on the steps of segmentation of the objects of interest, tracking of the objects and identification of unusual trajectories of the objects. The cameras attached to the system are handled independently from each other, especially the generated video data and object-related meta-data is stored and processed separately.
Another possible solution of helping a guard in monitoring a multitude of cameras and thus a multitude of images is disclosed in the scientific article from B. Hall and M. M. Trivedi: A novel graphical interface and context aware map for incident detection and monitoring, 2002, 9th World Congress on Intelligent Transport Systems, Chicago, Ill., USA, October 2002, which appears to be closest state of the art. The Hall-paper discloses a novel graphical interface and a context aware map, whereby static and dynamic streams of information and especially live video images are merged into the context aware map and can be interactively viewed, rotated, zoomed etc. by a user with the help of the graphical interface. The merging of the information is realized by transforming camera images in real-time onto a scene map, which can be filled with satellite images for the parts, which are not covered by the camera images.