A musical tune is composed on the basis of a measure of time, such as a bar and a beat. Accordingly, musicians play a musical tune using a bar and a beat as a basic measure of time. When taking a timing of playing of a musical tune, musicians play the musical tune using a method of making a specific sound at a certain beat of a certain bar but never play it using a timestamp-employing method of making a specific sound certain minutes and certain seconds after starting to play. Since music is defined by bars and beats, musicians can flexibly deal with a fluctuation in a tempo and a rhythm. In addition, each musician can express their originality in the tempo and the rhythm in a performance of an identical musical score.
A performance carried out by musicians is ultimately delivered to users as music content. More specifically, the performance of each musician is mixed down, for example, in a form of two channels of stereo and is formed into one complete package. This complete package is delivered to users, for example, as a music CD (Compact Disc) employing a PCM (Pulse Code Modulation) format. The sound source of this music CD is referred to as a so-called sampling sound source.
In a stage of a package of such a CD or the like, information regarding timings, such as bars and beats, which musicians are conscious about, is missing.
However, humans can naturally re-recognize the information regarding timings, such as bars and beats, by only listening to an analog sound obtained by performing D/A (Digital to Analog) conversion on an audio waveform in this PCM format. That is, humans can naturally regain a sense of musical rhythm. On the other hand, machines do not have such a capability and only have the time information of a timestamp that is not directly related to the music itself.
As an object to be compared with such a musical tune provided by a performance by musicians or by a voice of singers, there is a conventional karaoke system. This system displays lyrics in synchronization with the rhythm of music on a karaoke display screen.
However, such a karaoke system does not recognize the rhythm of music but simply reproduces dedicated data called MIDI (Music Instrument Digital Interface).
Performance information and lyric information necessary for synchronization control and time code information (timestamp) describing a timing (event time) of sound production are described in a MIDI format as MIDI data. The MIDI data is created in advance by a content creator. A karaoke playback apparatus only performs sound production at a predetermined timing in accordance with instructions of the MIDI data. That is, the apparatus generates (plays) a musical tune on the moment. This can be enjoyed only in a limited environment of MIDI data and a dedicated apparatus therefor.
Furthermore, although various formats, such as SMIL (Synchronized Multimedia Integration Language), exist in addition to the MIDI, the basic concept is the same.
Meanwhile, a format mainly including a raw audio waveform called the sampling sound source described above, such as, for example, PCM data represented by CDs or MP3 (MPEG (Moving Picture Experts Group) Audio Layer 3) that is compressed audio thereof, is the mainstream of music content distributed in the market rather than the MIDI and the SMIL.
A music playback apparatus provides the music content to users by performing D/A conversion on these sampled audio waveforms of PCM or the like and outputting them. In addition, as seen in FM radio broadcasting or the like, there is an example in which an analog signal of a music waveform itself is broadcasted. Furthermore, there is an example in which a person plays music on the moment, such as in a concert and a live performance, and the music content is provided to users.
If a machine could automatically recognize a timing, such as a bar and a beat of music, from a raw music waveform of the music, a synchronization function allowing music and another medium, as in karaoke and dance, to be rhythm-synchronized can be realized even if there is no prepared information, such as event time information of the MIDI and the SMIL. Furthermore, regarding massive existing content, such as CDs, possibilities of a new entertainment broaden.
Hitherto, attempts to automatically extract a tempo or beats have been made.
For example, in Japanese Unexamined Patent Application Publication No. 2002-116754, a method is disclosed in which a self-correlation of a music waveform signal serving as a time-series signal is calculated, a beat structure of the music is analyzed on the basis of this calculation result, and a tempo of the music is further extracted on the basis of this analysis result.
In addition, in Japanese Patent No. 3066528, a method is described in which sound pressure data for each of a plurality of frequency bands is created from musical tune data, a frequency band at which the rhythm is most noticeably taken is specified from the plurality of frequency bands, and rhythm components are estimated on the basis of a cycle of the change in the sound pressure data of the specified frequency timing.
Techniques for calculating the rhythm, the beat, and the tempo are broadly classified into those for analyzing a music signal in a time domain as in the case of Japanese Unexamined Patent Application Publication No. 2002-116754 and those for analyzing a music signal in a frequency domain as in the case of Japanese Patent No. 3066528.
However, in the method of Japanese Unexamined Patent Application Publication No. 2002-116754 for analyzing a music signal in a time domain, high extraction accuracy cannot be obtained essentially since the beat and the time-series waveform do not necessarily match. In addition, the method of Japanese Patent No. 3066528 for analyzing a music signal in a frequency domain can relatively improves the extraction accuracy than Japanese Unexamined Patent Application Publication No. 2002-116754. However, data resulting from the frequency analysis contains many beats other than beats of a specific musical note and it is extremely difficult to separate the beats of the specific musical note from all of the beats. In addition, since the musical tempo (time period) itself fluctuates greatly, it is extremely difficult to extract only the beats of the specific musical note while keeping track of these fluctuations.
Accordingly, it is impossible to extract beats of a specific music note that temporally fluctuate over an entire musical tune with conventional techniques.
The present invention is suggested in view of such conventional circumstances. It is an object of the present invention to provide a beat extracting device and a beat extracting method capable of extracting only beats of a specific musical note highly accurately over an entire musical tune regarding the musical tune whose tempo fluctuates.
To achieve the above-described object, a beat extracting device according to the present invention is characterized by including beat extraction processing means for extracting beat position information of a rhythm of a musical tune, and beat alignment processing means for generating beat period information using the beat position information extracted and obtained by the beat extraction processing means and for aligning beats of the beat position information extracted by the beat extraction processing means on the basis of the beat period information.
In addition, to achieve the above-described object, a beat extracting method according to the present invention is characterized by including a beat extraction processing step of extracting beat position information of a rhythm of a musical tune, and a beat alignment processing step of generating beat period information using the beat position information extracted and obtained at the beat extraction processing step and of aligning beats of the beat position information extracted by the beat extraction processing means on the basis of the beat period information.