Temporal video registration consists in temporally aligning frames of two videos (or of segment of videos) of a same content.
Temporal registration has applications in the domain of video forensics, such as registering a pirated copy video with a master video in order to enable or enhance a watermark decoding. In the domain of video home editing, temporal registration of video segments from different recordings of the same event (possibly using different sensors such as camcorder, mobile phone, PDA, digital still camera, etc.) allows combination of such video segments into one video stream. This problem is solved in a professional context (TV, cinema), either by physically connecting capture devices (“jam sync”) or by using “claps”. But there is no such solution in the consumer domain (home videos of parties, concerts, weddings, vacations . . . ). Temporal registration may also be used in other multi-sensor video environment such as visible and infra-red. While in the domain of time super resolution, a temporal matching of multiple unsynchronized videos with low frame-rate into a single video with higher frame-rate is performed.
A common technique to achieve this registration is based on the registration of temporal video fingerprints. The “fingerprint” of a video content is a set of features—automatically extracted from the video signal—which are compact, discriminant and robust to signal distortions. A temporal fingerprint is a particular kind of fingerprint capturing the evolution of the signal over time. However, computing and aligning temporal fingerprints of a target and a master video is time and power consuming for a whole content, while in many cases only a temporal segment of such video is interesting (such as in video forensics, the temporal segment with the embedded watermark). Thus known method such as direct frame wise alignment of entire long videos is not possible and subject to high probability of errors, while alignment of segments within a long video raises the issue of localizing such segments in the longer video. In the state of the art, this localization is visually done by an operator.
A method for automatic temporal registration of long video or segments of long video is needed.