Synchronization of several audio signals is quite a new subject, with little examples or applications so far. Moreno et al. (see P J Moreno, C F Joerg, J M Van Thong, and Oren Glickman, “A recursive algorithm for the forced alignment of very long audio segments,” ICSLP, 1998) present a way to align very long audio files using speech recognition.
In the last few years there has been an interest for audio matching as set forth in Wang (see Avery Li-chun Wang, “An Industrial Strength Audio Search Algorithm,” ISMIR, 2003), Müller et al. (see M Müller, Frank Kurth, and M Clausen, “Audio Matching via Chroma-Based Statistical Features,” ISMIR, 2005) and Yang (see Cheng Yang. MACS: Music Audio Characteristic Sequence Indexing For Similarity Retrieval . . . of Signal Processing to Audio and Acoustics, 2001 . . . , (October):1-4, 2001). Mostly these techniques are designed for matching a track against a database of high quality files. To efficiently match audio files, tracks are summed up in fingerprints.
In Wang for example (algorithm used in Shazam music recognition service), the fingerprint consists in a sequence of hash values appearing at specific times in the signals. After searching for matches in the database, two signals are finally declared as matching if their hashes in common tend to be located at the same times in both files, with the second file or hash positioned at a specific delay from the first one.