Real-world audio signals of CD recordings or the like are sound mixtures for which it is impossible to assume the number of sound sources in advance. In the sound mixtures as described above, frequency components frequently overlap with each other. In addition, there is also a sound having no fundamental frequency component. Most of conventional pitch-estimation technologies, however, assume a small number of sound sources, and locally trace frequency components, or depend on existence of fundamental frequency components. For this reason, these technologies cannot be applied to the real-world sound mixtures described above.
Then, the inventor of the present invention proposed an invention entitled “Method and Device for Estimating Pitch” as disclosed in Japanese Patent No. 3413634 (Patent Document 1). In this disclosure, it is considered that an input sound mixture simultaneously includes sounds of different fundamental frequencies (corresponding to “pitches” abstractly used in the specification of the present application) in various volumes. In this invention, in order to utilize a statistical approach, frequency components of the input are represented as a probability density function (an observed distribution), and a probability distribution corresponding to a harmonic structure of each sound is introduced as a tone model. Then, it is considered that the probability density function of the frequency components has been generated from a mixture distribution model (a weighted sum model) of tone models for all target fundamental frequencies. Since a weight of each tone model in the mixture distribution indicates how relatively dominant each harmonic structure is, the weight of each tone model is referred to as a probability density function of a fundamental frequency (the more dominant the tone model becomes in the mixture distribution, the higher probability of the fundamental frequency indicated by that model will become). The weight value (or the probability density function of the fundamental frequency) may be estimated by using the EM (Expectation-Maximization) algorithm (Dempster, A. P., Laird, N. M and Rubin, D. B.: Maximum likelihood from incomplete data via the EM algorithm, J. Roy, Stat. Soc. B, Vol. 39, No. 1, pp. 1-38 (1977)). The probability density function of the fundamental frequency thus obtained indicates at which pitch and in how much volume a component sound of the sound mixture sounds.
The inventor of the present invention has announced technologies, which have developed or enhanced the previous invention titled “Method and Device for Estimating Pitch,” in two non-patent papers, Non-Patent Document 1 and Non-Patent Document 2. Non-Patent Document 1 is “A PREDOMINANT-FO ESTIMATION METHOD FOR CD RECORDINGS: MAP ESTIMATION USING EM ALGORITHM FOR ADAPTIVE TONE MODELS” that was announced in May 2001. This paper was released in the proceedings V of “The 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing” pp. 3365-3368. Non-patent Document 2 is “A real-time music-scene-description system: predominant-FO estimation for detecting melody and bass lines in real-world audio signals” that was announced in September 2004. This paper was released in “Speech Communication 43 (2004)”, pp. 311-329. The enhancements proposed in these two Non-patent Documents are use of multiple tone models, tone model parameter estimation, and introduction of prior distribution for model parameters. These enhancements will be described later in detail.