1. Technical Field
Embodiments disclosed herein relate to methods and apparatus for automatically creating a “movie” from user-provided digital music content and user-provided digital moving video content.
2. Description of Related Art
In the production of multimedia presentations, it is often desirable to synchronize music and video. Such synchronization can, however, be difficult with certain types of music.
Music composers create music with a particular tempo and a “meter.” The meter is the part of rhythmical structure concerned with the division of a musical composition into “measures” by means of regularly recurring accents, with each measure consisting of a uniform number of beats or time units, the first of which usually has the strongest accent. “Time” is often used as a synonym of meter. It is the grouping of the successive rhythmic beats, as represented by a musical note taken as a time unit. In written form, the beats can be separated into measures, or “bars,” that are marked off by bar lines according to the position of the principal accent.
Tempo is the rate at which the underlying time unit recurs. Specifically, tempo is the speed of a musical piece. It can be specified by the composer with a metronome marking as a number of beats per minute, or left somewhat subjective with only a word conveying the relative speed (e.g. largo, presto, allegro). Then, the conductor or performer determines the actual rate of rhythmic recurrence of the underlying time unit.
The tempo does not dictate the rhythm. The rhythm may coincide with the beats of the tempo, but it may not. FIG. 1 shows, in standard musical notation, two measures of a simple musical composition with a four-four meter, or “time signature,” identified by the 4/4. This meter could also expressed as “the quarter note ‘gets the beat’ with four beats to a measure”. The tempo will be the rate at which the quarter notes (individual solid notes in FIG. 1) recur, but in FIG. 1 the actual tempo is unspecified.
Each measure generally begins and ends with a bar line and may include an Arabic number above its beginning bar as identification. The rhythm in FIG. 1 is a steady repetition of an accented beat (indicated by “>”) followed by three unaccented beats. The rate of recurrence of beats in the rhythm is the same as the tempo and each beat in the rhythm will occur on the beats of the tempo. The frequency of the accented beats in the rhythm is one-fourth of the tempo.
FIG. 2 shows three measures using the same tones (“pitches”) as the notes of the musical composition in FIG. 1, but with a different meter. The meter of the musical composition in FIG. 2 is symbolized as 3/4 and expressed as “the quarter note ‘gets the beat’ with three beats to a measure.” Here, the rhythm is a steady repetition of an accented beat followed by two unaccented beats. The rate of recurrence of beats in the rhythm is the same as the tempo and each beat in the rhythm will fall on the beats of the tempo. The frequency of the accented beats in the rhythm is one-third of the tempo.
FIG. 3 shows two measures of a simple musical composition with a 4/4 meter, with the quarter note getting the beat. Here, however, the rhythm varies in each measure. In measure 1, there are two half notes (open note equal to two quarter notes in duration), and in measure 2 there is a dotted quarter note (one and a half times the duration of a quarter note) followed by five eighth notes (each one half the duration of a quarter note). In each case, the first beat of the measure is accented followed by unaccented beats. However, the accented beats in FIG. 3 are not the same duration. Where two or more beats occur during one tempo beat period, the tempo beat is broken into appropriate sub time frames. Since in FIG. 3 the most beats per underlying time unit is two, the time unit is split into two and the time is “counted” as follows: One And Two And Three And Four And. In the second measure, the dotted quarter note is counted One And Two, the first eighth note is counted as the “And” of Two, the second eighth note as Three, the third eighth note as the “And” of Three, the fourth eighth note as Four, and the last eighth note as the “And” of Four. The And is symbolized by an addition sign, “+”. For illustration, the “counted” beats of the tempo are printed below the notes in FIGS. 1-3. Thus, one can see that only some beats of the rhythm coincide with the tempo beats. Note that the frequency of the accented beats in FIG. 3 is still one-fourth the tempo.
When asking a room of people to “keep time” to the beat of a musical composition, the response may vary. With reference to the compositions of FIGS. 1-3, some may mark one beat per measure (the most accented beat in the measure, often the first beat) and some may mark a faster recurrence of beats. With respect to a musical composition like the two measures in FIG. 1, the second group of people will be marking four times to the first group's one mark in the same time period.
The fundamental beat frequency is a name given to the frequency of the predominant beats that the majority of people perceive in any given musical composition as they keep time with the music. (Note that this use of the term “frequency” is in contrast to another use of the term “frequency” to denote the pitch of a note.) Candidates for the fundamental beat frequency of the two measures of FIG. 1 could either be the tempo (number of quarter notes per minute, since all beats of the rhythm coincide with the underlying time unit) or the frequency of the accented first beat of the measure, which is one-fourth the frequency of the tempo. Candidates for the fundamental beat frequency of the three measures of FIG. 2 could either be the tempo (since all beats of the rhythm coincide with the underlying time unit) or the frequency of the accented first beat of the measures, which is one-third the tempo.
The fundamental beat frequency of the measures of FIG. 3 is unlikely to be the tempo, even though there are beats on 1 and 3 in the first measure and 1, 3 and 4 in the second measure. Candidates could be the frequency of the accented beats (one fourth of the tempo) or the frequency of beats 1 and 3 (half of the tempo). However, analysis of more measures of the composition may be necessary to determine the fundamental beat frequency.
The fundamental beat frequency may depend on other aspects of the music, like the presence, pattern, and relative strengths of accents within the rhythm. As is the case with tempo, the fundamental beat frequency is specified as beats per minute (BPM). The fundamental beat frequency in music typically ranges from 50 to 200 BPM and, of course, may change over the course of a complete composition.
Dance music has a rather pronounced and consistent fundamental beat frequency, but jazz, classical (symphonic) music, and some individual songs have inconsistent fundamental beat frequencies, because the tempo, or meter, or rhythm, or all three may change. Disc jockeys have made use of reasonably priced equipment that can detect the fundamental beat frequency of certain types of dance music, such as modern rock, pop, or hip-hop. Usually, such equipment did not identify the beats that corresponded to the fundamental beat frequency, but merely provided a tempo, e.g., 60 or 120 BPM.
A more sophisticated analyzer, unlike simpler DJ-style BPM equipment, is needed to successfully determine the fundamental beat frequency of a wider range of musical styles including jazz, classical, etc. and of material where the tempo and rhythm change, e.g. Zorba the Greek. The advent of the mathematical technique known as the discrete wavelet transform (“DWT”) has enabled more precise temporal and spectral analysis of a signal. Use of the DWT has addressed some of the shortcomings of the earlier mathematical technique of Fourier transform. In particular, coefficient wavelet (“DAUB4”) variations of the DWT proposed by Ingrid Daubechies, have enabled digital analysis of music with much better real-time information.
A method using DWT to analyze a musical composition to estimate the tempo is described in Section 5 “Beat detection” of the article “Audio Analysis using the Discrete Wavelet Transform” by George Tzanetakis, Georg Essl, and Perry Cook. However, this method using the DWT often failed to detect the fundamental beat frequency in certain genres of music, especially jazz and classical. The beat frequency that it did detect often did not match the beat frequency determined by human analysis using a computer (i.e., listening and clicking the mouse to the music and then averaging the time between clicks).
Due to the nature of music performance, the beats do not always fall with clock-like precision. Such imprecision and inconsistency, so that beats do not fall at exact time period intervals of the fundamental beat frequency is expected, and even desired. However, when such music is incorporated into multimedia productions, sophisticated synchronization of audio and video is necessary. That is, the eye will immediately notice if still images or moving video content is manipulated or changed at inappropriate instants in time, i.e., times not corresponding closely enough to the beat corresponding to the fundamental beat frequency, be it slightly ahead or behind the actual beat onset times. In certain audiovisual applications, it is not sufficient to merely determine the fundamental beat frequency, but rather, it is desirable to select the exact beat onset time that are associated with this fundamental beat frequency.
A time domain signal (amplitude vs. time) display of a musical composition does not always readily indicate the fundamental beat frequency. The envelope of the time domain signal can be manipulated to make the onsets of the notes of the instrument (whether it be voice, rhythm, wind, brass, reed, string, or whatever else is being used) appear as amplitude peaks. However, most of the time not all of the peaks are beat onset times that correspond to the fundamental beat frequency.
Moreover, those computer users with know-how have edited video content to be accompanied by music. How-to books abound on editing video content with your computer, such as Making Movies with Your PC by Robert Hone and Margy Kuntz, Prima Publishing, 1994, or using a video editor such as the one available in Pinnacle Studio 9, as described in Pinnacle Studio 9 for Windows by Jan Ozer, Peachpit Press, 2004. However, a wider range of people would like to create personalized movies, consisting of video that they have selected accompanied by music they have selected. Or they may just prefer to make a video faster, by having a computer program do it automatically. Preferably, users desire movies with variable length scenes and effects. Users prefer to have a range of video styles from which to create a movie.
It is therefore desirable to provide automated methods and apparatus for creating “movies” from user-provided digital music content and user-provided digital moving video content.