Video surveillance is currently a fast-growing market which tends to become increasingly widespread for various applications. It can be used today in such areas such as crime prevention, security systems utilized in private and public areas, abnormal event detection, traffic monitoring, customer behaviour or general data gathering, etc.
Despite an ever-increasing usage, a mainstream video surveillance system has strong inherent limitations which may lead to poor performance, especially for solving crimes and offenses, due to the way it is used. Usually, a video surveillance system comprises streaming camera footages to be recorded as well as displayed in real-time to human operators. Unfortunately, only a very limited fraction of camera images can be seen real-time by humans while the rest of footage recordings is used after-action for batch or forensic activities. Usually, this forensic after-action viewing is rarely used, not only because it is often too late and useless at this point, but also because it is a time-consuming task to be performed to retrieve and track people such as offenders moving across several cameras located in different places.
For this reason, VCA (Video Content Analysis) software applications have been developed to be able to perform some automatic video analysis, in order to trigger alarms and make video surveillance far more real-time responsive, as well as to make it easier to exploit the after-action recorded footages such as forensic activities or batch analysis tasks.
In particular, tracking algorithms based on VCAs are particularly important in many video surveillance applications, especially the security applications. These tracking algorithms, comprising detecting and tracking individual displacements of targets (wherein the targets are usually humans or vehicles), may be performed in the following different ways which are quite different in terms of steps to be performed and the uses:                mono-camera tracking: each of the cameras tracks individually the moves of the targets in the respective field of view. Generally speaking, the different cameras do not share tracking data and/or use the data transmitted from another camera for target tracking;        multiview tracking (also known as “fusion” or “overlapping fusion”): several cameras configured to track the moves of target objects share an area in their fields of view, the shared area being considered as a common or joint field of view monitored by two or several cameras. This allows solving problems caused by, for example, a limited field of view of a single camera for which certain objects are considered as hidden;        re-identification tracking (also known as “non-overlapping fusion”, or “sparse camera network tracking”): the re-identification tracking is configured to track the moves of target objects monitored by two or several cameras which do not have a common or joint field of view and may even be located far from each other. For example, the target objects to be tracked move across an entire city monitored by a sparse camera network over a long duration such as several days. According to an existing re-identification method for tracking one or several people to be tracked (called “target objects”), images of the target objects need to be recorded by one camera, and compared later to the pre-stored images of candidate people (called “candidate objects”) in other cameras so as to determine if the target objects have been recognized (re-identified) by one or several of the other cameras. It is thus possible to sparsely track one target by one camera at a time while the target moves a large distance, which is the objective of re-identification technology. The re-identification tracking is more and more useful to many security based or non-security based applications in which the target tracking is performed offline or in real-time.        
The mono-camera tracking and fusion algorithms have been greatly improved over the years and are able to generate decent results.
Many computer vision algorithms have been developed for the re-identification tracking. However, compared to the well-developed mono-camera tracking and fusion algorithms, the re-identification tracking algorithms are still far from perfect in terms of system reliability (e.g. correct tracking rate) and efficiency (e.g. computational time).
The latest algorithms or those being considered as relatively efficient are based on sophisticated machine learning methods. However, the existing algorithms for re-identification may still be very inefficient due to one or several of the following issues:                only image-based cues: the features used to re-identify are not distinguishing enough. They are essentially pixel-based information such as colors or textures extracted from images; but this is not always sufficient to distinguish similarly looking people. In addition, the pixel-based information may be affected by image artifacts which may be generated and shown in images captured by a camera due to the change of lighting, pose, colorimetry or even simply due to the hardware or software adjustment of cameras;        too many candidates: in most scenarios, one particular target must be compared to a great number of candidates, which makes this problem similar to the “needle in a haystack” problem and increases considerably the computational complexity and even the probability of failure;        resource-intensive: as mentioned above, the existing re-identification tracking algorithms are based on sophisticated image features and machine learning methods which need to process a great number of candidates, and thus are very time-consuming and require considerable computational resources. It is not rare that it takes several hours to process a video sequence lasting only several seconds to identify only one or two targets. Such resource-hungry re-identification tracking algorithms may nevertheless still not be able to track the targets accurately. In addition, since the algorithms are so time-consuming, they are not able to be utilized in real-time target tracking.        
Several improved re-identification tracking algorithms have been proposed to reduce the impact brought by one of the above-mentioned issues and thus to increase the efficiency of the algorithms. For example, some improved re-identification tracking methods perform a travel time estimation to reduce the number of candidates, as indicated in a IEEE conference paper “Inference of Non-Overlapping Camera Network Topology by Measuring Statistical Dependence” published by K. Tieu in 2005, and another IEEE conference paper “Probabilistic Modeling of Dynamic Traffic Flow across Non-overlapping Camera Views”, published by Huang in 2010.
The travel time estimation performed in the re-identification tracking algorithms comprises a step of measuring the travel times respectively spent by several people moving from the field of view of one camera to that of a neighboring camera and a step of calculating a mean value of the measured travel times, both steps being performed usually during a training phase of the method. The improved re-identification tracking method can thus estimate, based on the mean value of the measured travel times obtained in the training phase, the travel time that the target may need to spend while moving to the field of view monitored by another camera. In other words, the time instant that the target may arrive at the field of view of said another camera can be predicted.
In this way, instead of comparing the target with all the candidates of all of the images obtained by said another camera, only the candidates in images obtained around the time instant need to be processed. The number of candidates can thus be reduced.
However, the probability of failure of the above-mentioned re-identification tracking algorithms may still be high and the computational resources may be nevertheless spent on processing wrong candidates due to the travel time estimation being possibly very different from the real travel time spent by the target.
In addition, some of the above-mentioned re-identification tracking algorithms require human intervention such as manually tagging a group of people shown in images of a video stream during the training phase before measuring their respective travel times.
Consequently, there is a need for improving existing re-identification methods, and in particular, for improving their reliability such as increasing the correct tracking rate and also improving the system efficiency by reducing the need of computational resources.