When editing audio and video captured by multiple cameras, traditional media editing applications typically operate on the premise that audio portions captured at different cameras angles are coextensive with the captured video and align at a common point in time. But this is often not the case. In practice, the spatial arrangement of the multiple cameras, as well as the environment, contribute to deviations in audio relative to some point in time. These deviations, which can be as small as a fraction of a second, can lead to two or more captured audio portions being out of synchronization as perceived, for example, by a human listener.
FIG. 1A illustrates a multi-camera arrangement 100 for capturing video and audio of a subject 108 at different angles and positions. As shown, capture devices 102a, 102b, and 102c, which are typically cameras, are arranged at different angles A1, A2, and A3 relative to reference 110. Further, these capture devices are positioned at different distances, D1, D2, and D3 in space from subject 108. In this typical multi-camera arrangement 100, these angles and distances, as well as other various factors, such as the occurrence of ambient noise 104 near capture device 102a, affect the synchronization (and/or the quality) of the audio portions as they are captured.
One common technique for synchronizing the video captured at capture devices 102a, 102b, and 102c is to implement time codes associated with each video (or otherwise use some sort of global synchronization signal) to synchronize both the video and audio portions. In particular, a user is usually required to manually adjust the different videos to bring their time codes into agreement. A time code normally describes the relative progression of a video images in terms of an hour, minute, second, and frame (e.g., HH:MM:SS:FR). But a drawback to using time codes to synchronize audio requires the user to synchronize different video portions to a particular frame before synchronizing the audio portions. The effort to synchronize the audio is further exacerbated due to the number of samples of audio sound that is captured relative to the number of video frames. Typically, for each frame of video (e.g., 30 frames per second), there are 1,600 samples of audio (e.g., 48,000 samples per second). As such, audio portions for capture devices 102a, 102b, and 102c are typically synchronized based on the video portions and their time codes, which can contribute to undesired sound delays and echoing effects. Another common technique for synchronizing the audio (and the video) captured at capture devices 102a, 102b, and 102c is to use a clapper to generate a distinctive sound during the capture of the audio and video. A clapper creates an audible sound—as a reference sound—to synchronize audio during the capture of the audio. The clapper sound is used for editing purposes and is discarded during editing. Consider that a clapper (not shown) generates a sound (“noise”) 104 for capture by capture devices 102a, 102b, and 102c. Thus, clapper noise 104 can be used to synchronize the audio. A drawback to using clapper noise 104 to synchronize audio is that the distance from noise and capture devices 102a, 102b, and 102c can cause delays that hinder synchronization of the audio relating to scene 108.
FIG. 1B illustrates a typical work flow to integrate indicia, such as time codes and clapper sounds, within the audio or video for synchronization purposes prior to and/or during the capture of video using a multi-camera arrangement. As shown, a typical work flow to film a scene 108 (FIG. 1A) includes the stages of pre-production 140 (i.e., prior to capturing video and audio), production 142 (i.e., the capturing of video and audio), and post-production 144 (i.e., subsequent to capturing video and audio). In a pre-production stage 140 of capturing video, common synchronization techniques usually require that a user procure either time code generation hardware or a clapper, or both, before the video and audio is captured. In a production stage 142, common synchronization techniques usually require that a user implement time codes or a clapper to introduce points at which to synchronize video during the capture of the video and audio. In a post-production stage 144, a user normally uses the common synchronization techniques of the pre-production 140 and production 142 stages to synchronize the video. The time codes and clapper sounds require removal as they are intended for editing purposes and are distracting to an audience if time codes remain visible and clapper sounds remain audible in the final product.
It would be desirable to provide improved computing devices and systems, software, computer programs, applications, and user interfaces that minimize one or more of the drawbacks associated with conventional techniques for synchronizing either audio or video, or both.
Like reference numerals refer to corresponding parts throughout the several views of the drawings. Note that most of the reference numerals include one or two left-most digits that generally identify the figure that first introduces that reference number.