Since just before the start of the twenty-first century, the rate of creation of audio and video content by the average consumer has increased beyond the availability of systems to organize such content. With the exponential rise in the number and quality of personal media recording devices (e.g., smartphones), it has become commonplace for people to record audio and video at various social events, such as concerts, sporting events, etc. However it remains difficult for people to share their recordings in a simple way. Social media sites provide a venue for users to upload their recordings and point others to view their recordings, but in the case of shared events this functionality is almost by accident and not by design.
One key technology is the ability for the gathering system to work out how each recording can be related to one another in time. This is because users typically record only short snippets of an event, and the notion of “hardware” synchronization between mobile devices does not exist. Also, many mobile devices do not incorporate a time measure into their video streams. For a large-scale event attended by many users, such as a concert, the users' devices may collectively have a complete recording of the event not only across the duration of the event but also quite possibly from different points of view. However, without reference to a universal “clock” (or the equivalent thereof), it is not possible to view the recorded data in this manner (e.g., as a recording of the complete event).
In the context of “social” videos (e.g., video clips of live concerts, sporting events, etc. captured by users and shared via social networks), achieving a robust design for video synchronization involves overcoming several challenges posed by the inherent characteristics of such social videos. For example, social videos tend to have poor sound quality and low camera resolution, there is often local noise due to the environment in which the video is being captured, and extreme camera shake is a regular problem.
Creating a reference to a universal clock would also allow each user to see not only the recordings of other users, but also to see what someone else was seeing or hearing at the same time that they were recording the event. In essence, a technology that could work out, using the media signal only, what the time offset was between the various recordings, would be able to align all of the recordings on a single reference timeline. From that point on, automatic or manual editing becomes feasible.
The notion of using multiple cameras to record an event has long been used in the cinema industry. It is common in that industry to have a “hero” or main camera following a scene accompanied by lesser “witness” cameras which capture the scene from different points of view. Since 2005, it has become commonplace to use the witness views to help to capture 3-Dimensional information about the scene. This allows for more creativity in post-production. On set, many professional cameras are “genlocked” by hardware signals that ensure each frame is being recorded at the same time from each camera, and that timestamp is known and recorded alongside the pictures. In semi-professional scenarios, “genlocked” cameras are expensive and not used very often, and therefore recent approaches have started to explore how it is possible to work out the offset in time between the various signals even if they were not “genlocked”.
The sound of a “clapper board” clacking is sometimes used by editors to align multiple camera views. Some approaches have considered placing sound sources in the scene and using that to synchronize the views by comparing the audio signals between recordings. Others have considered using the speech of the actors in the scene itself. Given that hundreds or even thousands of recordings of a single event (e.g., a sporting event, concert, public rally, etc.) are uploaded by users, having the ability to automatically work out the time shift between all of the recordings would allow for the alignment of the recordings with each other.