Patterns occur in many forms of music. Musical patterns can be considered as groups of musical measures (also known as bars), for example two adjacent measures, which have musical characteristics that repeat within the overall musical piece. Often, melodic or harmonic phrases in popular music have the duration corresponding to a musical pattern, such as two measures, with repetitions in the signal between segments that are the length of the music pattern.
There are a number of practical applications in which it is desirable to identify such musical patterns from a musical audio signal.
A particularly useful application is to help synchronise automatic video scene cuts to musically meaningful points. For example, where multiple video (with audio) clips are acquired from different sources relating to the same musical performance, it would be desirable to automatically join clips from the different sources and provide switches between the video clips in an aesthetically pleasing manner, resembling the way professional music videos are created. One method already proposed by the Applicant is to detect downbeats from the music, that is the first beat of each measure, and to make switches on downbeats. This specification improves on this concept. It has been observed that for many songs in 4/4 time signature, one can count to eight while listening to the music, indicating a pattern consisting of two adjacent 4/4 measures; Applicant has determined that switching on the first beat of such eight beat patterns, at least more often than for other beats, produces a particularly professional-looking video edit.
The same concept applies to other time measures and groupings of measures, although this specification concentrates on adjacent 4/4 measures. Other practical applications are also mentioned later as alternatives to automating video scene cuts.
The following terms are useful for understanding certain concepts to be described later.
Pitch: the physiological correlate of the fundamental frequency (f0) of a note.
Chroma, also known as pitch class: musical pitches separated by an integer number of octaves belong to a common pitch class. In Western music, twelve pitch classes are used.
Beat or tactus: the basic unit of time in music, it can be considered the rate at which most people would tap their foot on the floor when listening to a piece of music. The word is also used to denote part of the music belonging to a single beat.
Tempo: the rate of the beat or tactus pulse represented in units of beats per minute (BPM).
Bar or measure: a segment of time defined as a given number of beats of given duration. For example, in music with a 4/4 time signature, each measure comprises four beats.
Downbeat: the first beat of a bar or measure.
Music pattern: groupings of musical measures. As an example, the music pattern may correspond to a group of two adjacent measures. Often, melodic or harmonic phrases in popular music have the duration corresponding to a music pattern, such as two measures. In this case, there will be repetitions in the signal between segments that are of the length or the music pattern.
Music structure: structures or musical forms in popular music are typically in sectional, repeating forms. Examples include the verse-chorus form common in pop music and the twelve-bar form of blues music.
Accent or Accent-based audio analysis: analysis of an audio signal to detect events and/or changes in music, including but not limited to the beginning of all discrete sound events, especially the onset of long pitched sounds, sudden changes in loudness of timbre, and harmonic changes.
As will be appreciated, human perception of musical meter involves inferring a regular pattern of pulses from moments of musical stress, a.k.a. accents. Accents are caused by various events in the music, including the beginnings of all discrete sound events, especially the onsets of long pitched sounds, sudden changes in loudness or timbre, and harmonic changes. Automatic tempo, beat, or downbeat estimators may try to imitate the human perception of music meter to some extent, by measuring musical accentuation, estimating the periods and phases of the underlying pulses, and choosing the level corresponding to the tempo or some other metrical level of interest. Since accents relate to events in music, accent based audio analysis refers to the detection of events and/or changes in music. Such changes may relate to changes in the loudness, spectrum, and/or pitch content of the signal. As an example, accent based analysis may relate to detecting spectral change from the signal, calculating a novelty or an onset detection function from the signal, detecting discrete onsets from the signal, or detecting changes in pitch and/or harmonic content of the signal, for example, using chroma features. When performing the spectral change detection, various transforms or filterbank decompositions may be used, such as the Fast Fourier Transform or multirate filterbanks, or even fundamental frequency f0 or pitch salience estimators. As a simple example, accent detection might be performed by calculating the short-time energy of the signal over a set of frequency bands in short frames over the signal, and then calculating difference, such as the Euclidean distance, between every two adjacent frames. To increase the robustness for various music types, many different accent signal analysis methods have been developed.
The systems and methods to be described hereafter draw on background knowledge described in the following publications which are incorporated herein by reference.    [1] Peeters and Papadopoulos, “Simultaneous Beat and Downbeat-Tracking Using a Probabilistic Framework: Theory and Large-Scale Evaluation”., “IEEE Trans. Audio, Speech and Language Processing, Vol. 19, No. 6, August 2011.    [2] Eronen, A. and Klapuri, A., “Music Tempo Estimation with k-NN regression,” IEEE Trans. Audio, Speech and Language Processing, Vol. 18, No. 1, January 2010.    [3] Seppänen, Eronen, Hiipakka. “Joint Beat & Tatum Tracking from Music Signals”, International Conference on Music Information Retrieval, ISMIR 2006 and Jarno Seppänen, Antti Eronen, Jarmo Hiipakka: Method, apparatus and computer program product for providing rhythm information from an audio signal. Nokia November 2009: U.S. Pat. No. 7,612,275.    [4] Antti Eronen and Timo Kosonen, “Creating and sharing variations of a music file”—United States Patent Application 20070261537.    [5] Klapuri, A., Eronen, A., Astola, J., “Analysis of the meter of acoustic musical signals,” IEEE Trans. Audio, Speech, and Language Processing, Vol. 14, No. 1, 2006.    [6] Jehan, Creating Music by Listening, PhD Thesis, MIT, 2005. http://web.media.mit.edu/˜tristan/phd/pdf/Tristan_PhD_MIT.pdf    [7] D. Ellis, “Beat Tracking by Dynamic Programming”, J. New Music Research, Special Issue on Beat and Tempo Extraction, vol. 36 no. 1, March 2007, pp. 51-60. (10pp) DOI: 10.1080/09298210701653344.    [8] Matthias Mauch, Katy Noland, Simon Dixon “USING MUSICAL STRUCTURE TO ENHANCE AUTOMATIC CHORD TRANSCRIPTION” in Proc. 10th International Society for Music Information Retrieval Conference (ISMIR 2009).    [9] M. Cooper and J. Foote. Summarizing popular music via structural similarity analysis. In WASPAA, New Platz, N.Y., USA, 2003.    [10] Paulus, J., Klapuri, A., “Music Structure Analysis Using a Probabilistic Fitness Measure And an Integrated Musicological Model”, in Proc. of the 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, Pa., USA, Sep. 14-18, 2008, pp. 369-374.    Available at http://www.cs.tut.fi/sgn/arg/paulus/paulus_ismir08.pdf.    [11] J. Foote, “Automatic Audio Segmentation using A measure of Audio Novelty” Proceedings of IEEE-ICME, vol. I, pp. 452-455, July 2000.