With the rapid development of social modernization, social security has been more and more concerned, especially monitoring of some public places such as schools, hospitals, government offices, etc. Nowadays, thousands of cameras are used in daily monitoring of public places, and generate a lot of video data every day. However, the conventional video monitoring systems require human monitoring, so they have many insurmountable problems.
First of all, the videos should be monitored in real-time and cannot be automatically monitored and warned in the existing video monitoring systems, which causes the monitoring personnel to be over-fatigued because of the long-time monitoring. Meanwhile, owing to factors like monitoring range, multi-path monitoring is usually required to be performed and usually could not be attended at all, so an intelligent video monitoring system is necessary.
The core technology of the intelligent video monitoring system is the technology of visual target re-identification in a large-range monitoring scene, which has always been one of the research hotspots in the field of computer vision. Furthermore, visual target re-identification based on multi-camera target tracking has attracted many scholars' attention. The multi-camera target tracking can not only achieve monitoring and tracking of pedestrians in public places, but also provide more valuable information for further advanced processing (e.g. behavior recognition). A conventional multi-camera target tracking algorithm mainly includes two steps: the first step is to achieve a single-camera tracking of multiple targets in a single scene to obtain a complete single-camera track of each target in the scene; the second step is to achieve a cross-camera target tracking, namely a connection of cross-camera tracks is achieved by means of cross-camera time and space information, thereby achieving the target tracking. It can be seen from these two steps that the cross-camera target tracking is based on the single-camera target tracking, and the input thereof is from the result of the single-camera tracking. In other words, when the result of the single-camera tracking fails to meet a certain criterion, the effect of the cross-camera tracking will be badly influenced. In fact, the present single-camera target tracking algorithms will produce a lot of fractured and fragmented tracks and erroneous interference tracks in practical application, and they are not qualified for the cross-camera tracking. In this case, the effect of the cross-camera tracking algorithm cannot be guaranteed, which finally makes it difficult to achieve a multi-camera target tracking in actual scenes. The conventional cross-camera tracking algorithms are based on the assumption that the single-camera tracking effect is ideal enough to be used as the input for the algorithms, so they achieve a relative poor tracking effect in practical application. Therefore, it has become an urgent problem to solve that how to increase the accuracy of cross-camera tracking under the poor effect of single-camera target tracking so as to achieve a basic multi-camera target tracking and to achieve target re-identification.