1. Field of the Invention
The present invention relates to a beat tracking technique of estimating tempos and beat times from acoustic information including beat, such as music or scat, and a technique for a robot interacting musically using the beat tracking technique.
2. Description of Related Art
In recent years, robots such as humanoids or home robots interacting socially with human beings were actively studied. It is important to undertake a study of musical interaction where the robot is allowed to listen to music on its own, move its body, or sing along with the music in order for the robot to achieve natural and rich expressions. In this technical field, for example, a technique is known for extracting beats from live music which has been collected with a microphone in real time and making a robot dance in synchronization with these beats (see, for example, Unexamined Japanese Patent Application, First Publication No. 2007-33851).
When the robot is made to listen to music and is made to move to the rhythm of the music, a tempo needs to be estimated from the acoustic information of the music. In the past, the tempo was estimated by calculating a self correlation function based on the acoustic information (see, for example, Unexamined Japanese Patent Application, First Publication Nos. 2007-33851 and 2002-116754).
However, when a robot listening to the music extracts beats from the acoustic information of the music and estimates the tempo, there are roughly two technical problems to be solved. The first problem is the guaranteeing of robustness with respect to noises. A sound collector, such as a microphone, needs to be mounted to make a robot listen to the music. In consideration of the visual quality in the appearance of the robot, it is preferable that the sound collector be built in the robot body.
This leads to the problem that the sounds collected by the sound collector include various noises. That is, the sounds collected by the sound collector include environmental sounds generated in the vicinity of the robot and sounds generated from the robot itself as noises. Examples of the sounds generated from the robot itself are the robot's footsteps, operation sounds coming from a motor operating inside the robot body, and self-vocalized sounds. Particularly, the self-vocalized sounds serve as noises with an input level higher than the environmental sounds, because a speaker as a voice source is disposed relatively close to the sound collector. In this way, when the S/N ratio of the acoustic signal of the collected music deteriorates, the degree of precision at which the beats are extracted from the acoustic signal is lowered and the degree of precision for estimating a tempo is also lowered as a result.
Particularly, in operations which are required for the robot to achieve an interaction with the music, such as making a robot sing or phonate to the collected music sound, the beats of the collected self-vocalized sound as noise have periodicity, which has a bad influence on a tempo estimating operation of the robot.
The second problem is the guaranteeing of tempo variation following ability (adaptability) and stability in tempo estimation. For example, the tempo of the music performed or sung by a human being is not always constant, and typically varies in the middle of a piece of music depending on the musical performer or the singer's skill, or on the melody of the music. When a robot is made to listen to music having a non-constant tempo and is made to act in synchronization with the beats of the music, high tempo variation following ability is required. On the other hand, when the tempo is relatively constant, it is preferable that the tempo be stably estimated. In general, to stably estimate the tempo with a self correlation calculation, it is preferable that a large time window used in the tempo estimating process be set, however the tempo variation following ability tends to deteriorate instead. That is, a trade-off relationship exists between guaranteeing of tempo variation following ability and guaranteeing of stability in tempo estimation. However, in the music interaction of the robot, both abilities need to be excellent.
Here, considering the relation of the first and second problems, it is necessary to guarantee stability in tempo estimation as a portion of the second problem so as to guarantee robustness with respect to noises as the first problem. However, in this case, a problem exists in that it is difficult to guarantee tempo variation following ability as the other portion of the second problem.
Unexamined Japanese Patent Application, First Publication Nos. 2007-33851 and 2002-116754 do not clearly disclose or teach the first problem at all. In the known techniques including Unexamined Japanese Patent Application, First Publication Nos. 2007-33851 and 2002-116754, self correlation in the time direction in the tempo estimating process is required and the tempo variation following ability deteriorates when a wide time window is set in order to guarantee stability in tempo estimation, thereby not dealing with the second problem.