1. Field of the Invention
The present invention relates to an apparatus and a method for extracting the beat of the rhythm of a piece of music being played back while an input music signal is being played back. Furthermore, the present invention relates to an apparatus and a method for displaying an image synchronized with a piece of music being played back by using a signal synchronized with an extracted beat. Furthermore, the present invention relates to an apparatus and a method for extracting a tempo value of a piece of music by using a signal synchronized with a beat extracted from the piece of music being played back. Furthermore, the present invention relates to a rhythm tracking apparatus and method capable of following changes in tempo and fluctuations in rhythm even if the tempo is changed or the rhythm fluctuates in the middle of the playback of a piece of music by using a signal synchronized with an extracted beat. Furthermore, the present invention relates to a music-synchronized display apparatus and method capable of displaying, for example, lyrics in synchronization with a piece of music being playing back.
2. Description of the Related Art
A piece of music provided by a performer or by the voice of a singer is composed on the basis of a measure of time such as a bar or a beat. Musical performers use a bar and a beat as a basic measure of time. When taking a timing at which a musical instrument is played or a song is performed, musical performers perform by making a sound in accordance with which beat of which bar has currently been reached and never perform by making a sound a certain period of time after starting to play, as in a time stamp. Since a piece of music is defined by bars and beats, the piece of music can be flexibly dealt with even if there are fluctuations in tempo and rhythm, and conversely, even with a performance of the same musical score, individuality can be realized for each performer.
The performances of these musical performers are ultimately delivered to a user in the form of musical content. More specifically, the performance of each of the musical performers is mixed down, for example, in the form of two channels of stereo and is formed into a so-called one complete package (content upon which editing has been completed). This complete package is packaged as, for example, a CD (Compact Disc) with a format of a simple audio waveform of PCM (Pulse Code Modulation) and is delivered to a user. This is what is commonly called a sampling sound source.
Once the piece of music has been packaged as, for example, a CD, timing information, such as that regarding a bar and a beat, which musical performers are conscious about, is lost.
However, a human being has an ability of naturally recognizing timing information, such as that regarding a bar and a beat, by only hearing analog sound in which an audio waveform of PCM has been converted from digital to analog form. It is possible to naturally recognize the rhythm of a piece of music. Unfortunately, it is difficult for machines to do this. Machines can only understand the time information of a time stamp that is not directly related to a piece of music itself.
As an object to be compared with the above-described piece of music provided by a performer or by the voice of a singer, there is a karaoke (sing-along machine) system of the related art. It is possible for this system to display lyrics in time with the rhythm of the piece of music. However, such a karaoke system does not recognize the rhythm of the piece of music and only reproduces dedicated data called MIDI (Musical Instruments Digital Interface).
In an MIDI format, performance information and lyric information necessary for synchronized control, and time code information (time stamp) in which timing of sound production thereof is described (event time) are described. This MIDI data is created in advance by a content producer, and a karaoke playback apparatus only produces sound at a predetermined timing in accordance with instructions of the MIDI data. The apparatus reproduces a piece of music on the spot so to speak. As a result, entertainment can be enjoyed only in a limited environment of MIDI data and a dedicated playback apparatus therefor.
In addition to MIDI, numerous other various formats, such as SMIL (Synchronized Multimedia Integration Language) exist, but the basic way of concept is the same.
The dominant format of music content distributed in the market is a format in which a live audio waveform called the sampling sound source described above, such as PCM data typified by a CD or MP3 (MPEG (Moving Picture Experts Group) Audio layer 3), which is compressed audio thereof, is in the main rather than the above-described MIDI and SMIL.
The music playback apparatus provides music content to a user by converting these sampled audio waveforms of PCM, etc., from digital to analog form and outputting them. As seen in an FM radio broadcast, etc., there is an example in which an analog signal of an audio waveform itself is broadcast. Furthermore, there is an example in which a person plays live, such as in a concert, a live performance, etc., so that music content is provided to the user.
If a machine can automatically recognize a timing, such as a bar and a beat of a piece of music, from a live audio waveform of a piece of music that can be heard, synchronized functions, such as music and content on another medium being rhythm-synchronized like karaoke, can be realized even if no information, such as event time information, etc., of MIDI and SMIL, is provided in advance.
With respect to existing CD music content, a piece of music of an FM radio currently being heard, and a live piece of music currently being played, content on another medium, such as images and lyrics, can be played back in such a manner as to be synchronized with a piece of music that is heard, thereby broadening possibilities of new entertainment.
Attempts to extract tempo and to perform some kind of processing in synchronization with a piece of music have hitherto been proposed.
For example, in Japanese Unexamined Patent Application Publication No. 2002-116754, a method is disclosed in which self-correlation of a music waveform signal as a time-series signal is computed, beat structure of the piece of music is analyzed on the basis of the self-correlation, and the tempo of the piece of music is extracted on the basis of the analysis result. This is not a process for extracting tempo in real time while a piece of music is being played back, but is a process for extracting tempo as an offline process.
In Japanese Patent No. 3066528, it is disclosed that sound pressure data for each of a plurality of frequency bands is created from piece-of-music data, a frequency band at which rhythm is most noticeably taken is specified, and rhythm components are estimated on the basis of the period of change in the sound pressure of the specified frequency timing. Also, in Japanese Patent No. 3066528, an offline process is disclosed in which frequency analysis is performed a plurality of times to extract rhythm components from a piece of music.