Large-scale, distributed, and heterogeneous camera networks have been widely deployed in support of video monitoring applications such as surveillance, traffic law enforcement, and environmental monitoring, among others. One major challenge associated with the use of such camera networks is figuring out how to make sense of the vast quantity of visual data they generate. In particular, one seminal problem yet to be resolved is the fusion/aggregation of information of the same object across multiple cameras in the spatiotemporal domain, a task that is generally referred to as the “object re-identification” (or camera handoff) problem. Performing re-identification is required in instances where there is a desire to track and/or identify objects (e.g., people, groups of people, vehicles, etc.) as they transition between cameras, at potentially different times and locations. A prerequisite of fusing the vast quantity of visual data generated by a multi-camera network is to temporally synchronize videos acquired by different cameras.
There are several factors that might cause the videos recorded by different cameras to be out-of-sync at different stages in the processes of video acquisition and processing. For instance, at the acquisition step, even with the use of multi-camera video management software that supports simultaneous recording from various cameras, different cameras may take slightly different amounts of time to start recording, causing videos to be out-of-sync from the beginning. Another example comes from inherent frame rate discrepancies: videos from different cameras might be recorded at different frame rates, or even if they are set to be recorded at a certain frame rate, the actual acquisition frame rate may differ slightly (e.g., up to +/−5 fps) from the desired frame rate, giving rise to inevitable sync issues.
Even if the videos are converted to a common frame rate, quantization errors associated with the mapping of the different frame rates to a unique frame rate may accumulate over time and sync issues can be introduced by the conversion process. Moreover, video cameras often record video on a short segment-by-segment basis (whose frame rate also turns to vary slightly from segment to segment), and the segments need to be converted and stitched together before analysis, causing out-of-sync issue to accumulate and worsen over time. In some cases, the videos that need to be synchronized have a largely different intended frame rate (e.g., one video may have a frame rate of 5 fps and the other video 30 fps), which also causes a problem for manual synchronization methods.
Currently, most of such synchronization tasks are performed manually by going through a large amount of videos and temporally localizing specific events (e.g., a vehicle driving from one camera's field of view into that of an adjacent camera) across multiple cameras to aid the synchronization. The manual synchronization approach becomes more and more unrealistic as the amount of video data generated by the camera network grows (e.g., because of the number of cameras in the network grows). Manual synchronization is labor-intensive and time-consuming; it is also error-prone and does not usually provide satisfactory accuracy for applications that rely on precise synchronization (e.g., object tracking/handoff/re-identification across cameras with and without overlapping fields of view) down to small fractions of a second (e.g., 1/30 second in 30 fps systems).
It is often the case that video streams from two cameras need to be re-synchronized frequently due to the cameras exhibiting different rates of deviation from an intended frame rate. As a result, more efficient and accurate synchronization methods are needed for automatically processing videos produced by camera networks.