Video synchronization aims to temporarily align a set of input videos acquired by multiple cameras. Video synchronization may be a fundamental step for many applications in computer vision, such as three-dimensional (“3D”) reconstruction from multiple cameras, video morphing, facial performance manipulation, spatial compositing, motion analysis, etc. When several cameras are simultaneously used to acquire multiple viewpoint shots of a scene, then synchronization may be trivially achieved by using timecode information or camera triggers.
In the absence of timecode information and camera triggers, the videos may be synchronized using a recorded audio track, wherein the synchronization finds a fixed time offset between the cameras. Furthermore, videos may be synchronized through manual alignment, such as by finding video frame correspondence and manually computing the required time offset. However, these techniques require time-consuming manual effort for video alignment.
Existing video synchronization methods have significant limitations. For instance, existing techniques typically require simultaneously acquired viewpoint shots of a scene, videos that are acquired in the same location, a fixed temporal offset between the cameras, specific use cases (e.g., videos of faces), a similar appearance change, a combination thereof, etc.