In music terminology, the music meter comprises the recurring pattern of stresses or accents in the music. The musical meter can be described as comprising a measure pulse, a beat pulse and a tatum pulse, respectively referring to the longest to shortest in terms of pulse duration.
Beat pulses provide the basic unit of time in music, and the rate of beat pulses (the tempo) is considered the rate at which most people would tap their foot on the floor when listening to a piece of music. Identifying the occurrence of beat pulses in a piece of music, or beat tracking as it is known, is desirable in a number of practical applications. Such applications include music recommendation applications in which music similar to a reference track is searched for, in Disk Jockey (DJ) applications where, for example, seamless beat-mixed transitions between songs in a playlist is required, and in automatic looping techniques.
Beat tracking systems and methods generate a beat sequence, comprising the temporal position of beats in a piece of music or part thereof.
The following terms are useful for understanding certain concepts to be described later.
Pitch: the physiological correlate of the fundamental frequency (f0) of a note.
Chroma, also known as pitch class: musical pitches separated by an integer number of octaves belong to a common pitch class. In Western music, twelve pitch classes are used.
Beat or tactus: the basic unit of time in music, it can be considered the rate at which most people would tap their foot on the floor when listening to a piece of music. The word is also used to denote part of the music belonging to a single beat.
Tempo: the rate of the beat or tactus pulse, usually represented in units of beats per minute (BPM).
Bar or measure: a segment of time defined as a given number of beats of given duration. For example, in a music with a 4/4 time signature, each measure comprises four beats.
Accent or Accent-based audio analysis: analysis of an audio signal to detect events and/or changes in music, including but not limited to the beginning of all discrete sound events, especially the onset of long pitched sounds, sudden changes in loudness of timbre, and harmonic changes. Further detail is given below.
It is believed that humans perceive musical meter by inferring a regular pattern of pulses from accents, which are stressed moments in music. Different events in music cause accents. Examples include changes in loudness or timbre, harmonic changes, and in general the beginnings of all sound events. In particular, the onsets of long pitched sounds cause accents. Automatic tempo, beat, or downbeat estimators may try to imitate the human perception of music meter to some extent. This may involve the steps of measuring musical accentuation, performing period estimation of one or more pulses, finding the phases of the estimated pulses, and choosing the metrical level corresponding to the tempo or some other metrical level of interest. Since accents relate to events in music, accent based audio analysis refers to the detection of events and/or changes in music. Such changes may relate to changes in the loudness, spectrum and/or pitch content of the signal. As an example, accent based analysis may relate to detecting spectral change from the signal, calculating a novelty or an onset detection function from the signal, detecting discrete onsets from the signal, or detecting changes in pitch and/or harmonic content of the signal, for example, using chroma features. When performing the spectral change detection, various transforms or filter bank decompositions may be used, such as the Fast Fourier Transform or multi rate filter banks, or even fundamental frequency f0 or pitch salience estimators. As a simple example, accent detection might be performed by calculating the short-time energy of the signal over a set of frequency bands in short frames over the signal, and then calculating the difference, such as the Euclidean distance, between every two adjacent frames. To increase the robustness for various music types, many different accent signal analysis methods have been developed.
The system and method to be described hereafter draws on background knowledge described in the following publications which are incorporated herein by reference.    [1] Cemgil A. T. et al., “On tempo tracking: tempogram representation and Kalman filtering.” J. New Music Research, 2001.    [2] Eronen, A. and Klapuri, A., “Music Tempo Estimation with k-NN regression,” IEEE Trans. Audio, Speech and Language Processing, Vol. 18, No. 1, January 2010.    [3] Seppänen, Eronen, Hiipakka. “Joint Beat & Tatum Tracking from Music Signals”, International Conference on Music Information Retrieval, ISMIR 2006 and Jarno Seppanen, Antti Eronen, Jarmo Hiipakka: Method, apparatus and computer program product for providing rhythm information from an audio signal. Nokia November 2009: U.S. Pat. No. 7,612,275.    [4] Antti Eronen and Timo Kosonen, “Creating and sharing variations of a music file”—United States Patent Application 20070261537.    [5] Klapuri, A., Eronen, A., Astola, J., “Analysis of the meter of acoustic musical signals,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 14, No. 1, 2006.    [6] Jehan, Creating Music by Listening, PhD Thesis, MIT, 2005. http://web.media.mit.edu/˜tristan/phd/pdf/Tristan_PhD_MIT.pdf    [7] D. Ellis, “Beat Tracking by Dynamic Programming”, J. New Music Research, Special Issue on Beat and Tempo Extraction, vol. 36 no. 1, March 2007, pp. 51-60. (10pp) DOI: 10.1080/09298210701653344.    [8] A. Klapuri, “Multiple fundamental frequency estimation by summing harmonic amplitudes,” in Proc. 7th Int. Conf. Music Inf. Retrieval (ISMIR-06), Victoria, Canada, 2006.