1. Field of the Invention
The invention relates to systems and methods for processing of media files, and more particularly, to systems and methods for aligning signals from different recordings of the same sound source.
2. Related Art
Audio and video recordings of a scene or performance on a stage, for example, often involve recording the video at a distance from the stage to ensure that the entire scene is visible. This makes it difficult to obtain good quality audio at the microphone on the video camera. The microphone on the video camera may pick up crowd noise, wind noise, HVAC/building noise, traffic noise, and may be further susceptible to excessive reverberation and absorption of high frequencies. In addition, microphones on video cameras may not be of sufficient quality. Audio is ideally recorded using microphones positioned close to the sound source to reduce the chance of picking up the mentioned noise in the environment. Individual microphones on specific sound sources, such as for example, each instrument in a band, may further reduce the susceptibility to the noise. In the context of concerts, stages are often equipped with localized audio pickups, on the floor of the stage, or hanging from the ceiling above the stage, for purposes of reinforcing the audio picked up by the microphone on the video camera. In generating the final video, the audio signals recorded from the close-proximity microphones may be preferred over the audio from the video camera microphone, or mixed with the camera microphone audio, which may be processed by bandpass filtering, attenuating, and adding to the close microphone audio in order to provide ambience to the final audio mix.
One problem with mixing an audio signal with a video signal recording of the same scene is that the signals are inherently unsynchronized. Mixing the signals requires alignment of the signals. For example, in scenes involving dialog, the audio should be aligned with the video so that the audio does not lag the video depiction of the characters speaking, or vice versa. High-end recording systems use a SMPTE time code to time-stamp and synchronize different audio signals. Such time-stamping is not always available on equipment at the consumer level. Audio may be synchronized during recording using a cable run, or wireless link between the close microphone and the video camera. However, synchronizing during recording requires planning and setting up in advance of recording, which is not often done.
Recordings may be made using the video camera with a microphone as one audio and video source and the close microphones as another source. Editing tools may then be used to integrate the second source of audio into the video recording. Such editing tools include for example, Roxio Creator™ and Sony Vegas™. Using an editing tool to integrate the audio from the second source with the video recording is often a laborious task requiring that the editor manually position the audio in the video in proper alignment. Even once properly aligned, the audio may slowly become misaligned even after only a minute of playback due to a drift between the two recordings.
The drift between the recordings may be due to clocks from different recorders having slightly different frequencies from one another. The drift may also be due to audio codecs with variable bit rates that may not preserve absolute time with the same accuracy. Another cause of the drift may be due to the movement of the camera during the performance being recorded. The effect of the drift is greater when the close microphone and video camera audio signals are to be mixed than if a single audio signal (i.e. if the close microphone signals replace the far microphone signals) is kept in the final file. As mixed signals drift farther apart from each other over time, the summation may sound comb-filtered, then reverberated, then overly reverberated, and then have discrete echoes. For single signals, reverberation and echo may not be an issue, however, lip synchronization between the video and audio becomes worse during the playback.
The difficulty of aligning an audio recording from one sound source with a video or audio recording of the same scene from another sound source has been described in the context of a recording from a camera placed a distance away from the scene and an audio recording from a microphone placed close to the scene. A similar problem is presented when sound is recorded on a movie set, for example, and dialog is re-recorded in the studio for inclusion in the final movie. This process, which is called Automated Dialog Replacement (ADR), is used to make dialog more intelligible (less noisy and less reverberant), to translate the dialog to a foreign language, or to remove or replace profanity in the original dialog. The replacement audio recording may not be a recording contemporaneous with the video recording of the scene. However, alignment issues arise when mixing the audio with the originally recorded video, which for purposes of this description, may be considered to be the same scene for the recordings being mixed. The problems with aligning audio signals may arise in other applications or scenarios that may not involve video recordings.
Alignment issues may also arise in the context of streaming media signals. The streaming of media signals has become ubiquitous and alignment issues may arise in various applications. For example, a high-definition (“HD”) Radio station broadcasts both an analog transmission and digital transmission containing the same content. The broadcaster attempts (and sometimes fails) to align these transmissions manually. The receiver (i.e. the listener's radio receiver unit) is not equipped to align the two transmissions. Weather and geography (such as hills and other irregular surfaces) can result in the loss of the digital signal, at which point the receiver reverts to receiving the analog signal. The digital signal may fade in and out so that the receiver goes back and forth between the analog and digital signals. If the analog and digital signals are not aligned, the back and forth reception creates an annoying listening experience.
A need exists for ways to more easily and reliably align audio recordings taken of the same scene using different sources.