At present, 3D surveillance systems are a front research direction in intelligent surveillance systems. The 3D surveillance system embeds a large number of video pictures of surveillance devices into a unified reference background model in real time, integrates information of all surveillance pictures, to form overall cognition and free perspective observation of surveillance situation. Compared with 2D surveillance systems in the related art, monitoring personnel can quickly obtain exact locations and surveillance content of cameras and establish correspondence with scene environment without facing dozens of or even hundreds of surveillance screens. The 3D surveillance system can support high-level intelligent analysis of multi-camera collaboration, such as target detection and tracking, abnormal event detection, etc., and has broad prospects in fields such as intelligent transportation, intelligent security, intelligent communities, etc. In the process of establishing the 3D surveillance system, calibration of position and attitude of the camera in the 3D reference background model is a core link.
In the related art, for calibration problems, one method is based on sensors (such as GPS, inertial navigation, attitude sensors, etc.), which relies on special equipment and has low precision. The other method is automatic calibration method based on computer vision, which usually requires that there are enough overlapping fields between surveillance images, and calibrates relative poses between cameras by motion matching or feature matching. When the above calibration method is directly used to match a camera image and a reference background model image, it often fails due to a big difference between the two images or due to being lack of corresponding target motion information.
However, the 3D surveillance system in the related art mostly adopts an interactive calibration method, which establishes a correspondence relationship between each camera and the reference background model, and obtains the pose of the camera in combination with geometric calculation. However, this method has a large workload (e.g., proportional to the number of cameras) and is only suitable for static cameras, and cannot handle camera disturbances and Pan-Tilt-Zoom (PTZ) motion.