A constantly increasing demand for safety and security adds weight to the development of security surveillance systems provided with multiple cameras. However, the multi-camera surveillance system still takes a lot of human force to execute a real-time surveillance task. Since it is hard for the security personnel on visual surveillance to coordinate multiple cameras, after working for hours, the surveillance personnel's concentration lapses into distraction. Consequently, attempts have been made to group images monitored by multiple cameras into a panorama view in a mapping way, thereby enhancing a supervisor's monitoring efficiency and alleviating his workload. Such a method using the mapping technique to indicate the mapping relationship among planes inside two spaces is termed as homography mapping. However, image contents possibly contain a multitude of different planes, e.g. ground plane and vertical wall. In that sense, two images inadequately mapped with the homography mapping will give rise to the corresponding error.
Besides, there is also a proposal to actively alert a supervisor of noteworthy events desirably by means of the image analysis result. Such method directly analyzes images obtained by a single camera which carries out complex target detecting and tracking in the image domain. After completing the operation of a single-camera image analyzing unit, a common target mapping relationship is then determined among multiple cameras, so as to establish the information integration of the multiple cameras. As the method directly analyzes in the image domain, it will be influenced by various uncertain factors, such as varying light source, multi-object mutual occlusions and surrounding light and shadow, which are prone to generating a lot of false alarms.
To more definitely disclose the difference between the present invention and prior art, conventional systems most similar to the present invention are hereby depicted as follows.
U.S. Pat. No. 6,950,123B2 issued on Sep. 27, 2005 (hereinafter called document 1) discloses: (a) the system first initializes the processing of player tracking using an initialization component including manual target selection, region of interesting (ROI) definition, and camera calibration through the homography mapping wherein the manual target selection gives the system initial positions of targets to be tracked. The ROI defines a closed structured environment such as a soccer field. The homography mapping indicates the relationships between 2D camera views and 3D world plane; (b) the object detection and tracking are executed in each 2D camera view. These processes are accomplished by a motion detection module, a connective component module, and many 2D object trackers for different targets; and (c) different object locations are back-projected to 3D world plane for final data fusion.
Nevertheless, the differences between document 1 and the present invention are as follows: (a) document 1 needs to manually select target to be tracked while the present invention could automatically detect and track target (explained later); (b) document 1 adopts the homography mapping which is only suitable for long-distance monitoring, but not for a general surveillance zone, while the present invention is applicable to the general 3D-to-2D projection matrix, which is more suitable for general cases (explained later); (c) document 1 restricts the surveillance zone to a closed structured environment (e.g. soccer field), while the present invention is free from the constraint (explained later); and (d) in addition to the fact that document 1 directly solves the complicated tracking and corresponding problems in the 2D image and thus inevitably faces various challenges such as lighting variance, multi-object mutual occlusions and shadows effects, the way this method adopts is also different from that adopted by the present invention.
Moreover, US Patent No. 2006/0285723 A1 published on Dec. 21, 2006 (hereinafter called document 2) is a method for tracking the targets distributed across an area having a network of cameras, which discloses the following steps: (a) before the system starts operating, the topological proximity model of the camera network is built first through a training process; at beginning, user manually selects objects for tracking in the first camera view; (b) the system generates target models of interested objects; the target model includes color features, shape features, and texture feature; (c) the system executes background subtraction (motion detection) and particle tracking (object tracking) in the 2D camera view; (d) if a target moves out of the current camera view, the system transfers the target model to neighbor cameras for continuously tracking the leaving target by judging in accordance with the topological proximity model.
The aforementioned document 2 is different from the present invention in that: (a) document 2 needs to manually select tracking targets, while the present invention could automatically detect and track the target; (b) document 2 directly solves the complicated tracking and corresponding problems in the 2D image domains. This is not the approach of the present invention.
Furthermore, US2007/0127774 A1 published on Jun. 7, 2007 (hereinafter called document 3) relates to a target detecting and tracking system from video streams, and document 3 is characterized in: (a) detecting moving pixels in the video and grouping moving pixels into motion blocks; (b) automatically identifying targets based on the motion blocks; (c) tracking and managing the tracked targets in the video. Document 3 is different from the present invention in that: (a) document 3 directly solves the complicated tracking and corresponding problems in the 2D image domains; this is not the approach of the present invention (explained later); (b) document 3 focuses on the single-camera processing.
US Patent No. 2003/0123703 A1 published on Jul. 3, 2003 (hereinafter called document 4) discloses that: (a) this system requires the user to define a search area or surveillance zone, wherein several imaging devices are placed to monitor the search area, and neighboring camera views are overlapped; (b) the homography mapping matrices are decided by referring to the multiple landmark points in the world plane; all camera views are fused to universal images having a global coordination system; (c) this system executes background subtraction (motion detection) and object tracking in the fused universal images.
Document 4 is different from the present invention in that: (a) document 4 adopts the homography mapping which is only suitable for long-distance monitoring. Our method uses the general 3D-to-2D projection matrix which is more suitable for general cases; (b) document 4 solves the detection and tracking problems in the fused universal image; however, the universal image is constituted by warping multiple camera views into a global plane through the homography mapping; it does not completely represent the 3D depth information.
In view of the foregoing drawback of the prior arts, the present invention hereby provides a method and system of multi-target detecting and tracking with multiple cameras so that the result obtained from the analysis in each respective image domain of multiple single cameras is integrated into the 3D time domain, thereby facilitating detecting and tracking of multiple moving targets in the spatial domain and further helping monitoring personnel to efficiently manage the multi-camera surveillance system.