The present invention relates to a technique for estimating a time series of fundamental frequencies of a particular audio component (hereinafter referred to as “target component”) of an audio signal.
Heretofore, various techniques have been proposed for estimating a fundamental frequency (pitch) of a particular target component of an audio signal where a plurality of audio components (such as singing and accompaniment sounds) exist in a mixed fashion. Japanese Patent Application Laid-open Publication No. 2001-125562 (hereinafter referred to as “the patent literature”), for example, discloses a technique, according to which an audio signal is approximated as a mixed distribution of a plurality of sound models presenting harmonics structures of different fundamental frequencies, probability density functions of the fundamental frequencies are sequentially estimated on the basis of weightings of the individual sound models, and a trajectory of fundamental frequencies corresponding to prominent ones of a plurality of peaks present in the probability density functions is identified. For analysis of the plurality of peaks present in the probability density functions, a multi-agent model is employed which causes a plurality of agents to track the individual peaks.
With the technique of the patent literature, however, the peaks of the probability density functions are tracked under the premise of temporal continuity of the fundamental frequencies, and thus, in a case where sound generation of the target component stops or breaks often (i.e., presence/absence of the fundamental frequency of the target component often changes over time), it is not possible to accurately identify a time series of the fundamental frequencies of the target component.