The invention relates to electronic devices, and, more particularly, to circuitry and methods for beat matching in audio streams.
In recent years, methods have been developed which can track the tempo of an audio signal and identify its musical beats. This has enabled various beat-matching applications, including beat-matched audio editing, automatic play-list generation, and beat-matched crossfades. Indeed, in a beat-matched crossfade, a deejay slows down or speeds up one of the two audio tracks so that the beats between the incoming track and the outgoing track line up. When the tracks are from the same musical genre and the beat alignment is close, the transition sounds nearly seamless. After the outgoing track is gone, the incoming track beats can be ramped back to their original rate or maintained at the new rate, and this incoming track will eventually become the next outgoing track for the next cross-fade.
All beat matchers must mitigate the limitations of the beat detection method which they employ. This includes the tendency of beat detectors to jump from one tempo beats-per-minute value to a harmonic or sub-harmonic thereof between analysis frames.
Beat detection can be performed in various ways. A simple approach just computes autocorrelations and selects the beat period as the delay corresponding to the peak autocorrelation. In contrast, Scheirer, “Tempo and Beat Analysis of Acoustic Musical Signals”, 103 J. Acoustical Soc. Am. 588 (1998), employs a psychoacoustic model that decomposes the audio signal into bands via filterbanks and then performs envelope detection on each of these bands. It then tests various beat rate hypotheses by employing resonant comb filters for each hypothesis. However, the computational complexity of Scheirer limits applicability on portable devices. Alonso et al., “Tempo and Beat Estimation of Musical Signals”, Proc. Intl. Conf. Music Information Retrieval (ISMIR 2004), Barcelona, Spain, October 2004, proceeds through three steps: First an onset detector analyzes the audio signal and produces scalars that reflect the level of spectral change over time; this uses short-time Fourier transforms and differences the frequency channel magnitudes. The differences are summed and a threshold is applied through a median filter to output a detection function that shows only peaks at points in time that have large amounts of spectral change. Second, the detection function is fed to a periodicity estimator which applies spectral product methods to evaluate tempo (beat rate) hypotheses; this gives the beat rate estimate. In the third step a beat locator uses the detection function and the estimated beat rate to determine the locations of the beats in a frame.
Another important characteristic for beat matchers is to avoid excessively modifying the input music being matched to another (reference) music or beat source track. Typically, modifications are either time-scale modifications (TSM) or sampling rate conversions (SRC). FIG. 2a generally shows a beat matching (input beats bi[k] modified to align with reference beats br[k]), and FIG. 2b illustrates TSM versus SRC. For shrinking/expanding a time scale, TSM essentially deletes/replicates some information to preserve local structure, whereas SRC uniformly shrinks/expands everything.
TSM methods change the time scale of an audio signal without changing its perceptual characteristics. For example, synchronized overlap-and-add (SOLA) provides a time scale change by a factor r by taking successive length-N frames of input samples with frame k starting at time kTanalysis and aligning frame k to (within a range about) its target synthesis starting time kTsynthesis (where Tsynthesis=rTanalysis) in the currently synthesized output by optimizing the cross-correlation of the overlap portions and then adding aligned frame k to extend the currently synthesized output with averaging of the overlap portions. Various SOLA modifications lower the complexity of the computations; for example, Wong and Au, Fast SOLA-Based Time Scale Modification Using Modified Envelope Matching, IEEE ICASSP vol. III, pp. 3188-3191 (2002).
Sampling rate conversion (which may be asynchronous) theoretically is just analog reconstruction and resampling, i.e., non-linear interpolations. Ramstad, Digital Methods for Conversion between Arbitrary Sampling Frequencies, 32 IEEE Tr. ASSP 577 (1984) presents a general theory of filtering methods for interfacing time-discrete systems with different sampling rates and includes the use of Taylor series coefficients for improved interpolation accuracy.
Simplistic beat matchers have problems including jumps in detected tempos over time and extreme conversion ratios that produce unnatural-sounding audio outputs. In addition, a stable beat matcher that produces natural-sounding audio output in real-time (and on an embedded/portable system) has not been found in previous literature.