A generic analysis-based music synthesis system is depicted in FIG. 1. In the analysis part, a parametric representation of a music record is estimated using a musical sound model. In the synthesis part, the parametric representation or its transformation is used to produce a synthesized record.
The idea of creating musical sounds using sinusoidal models is at least a century old. See, C. Roads, The Computer Music Tutorial (1996 MIT Press) p.134, for a brief survey. The first music synthesizer Talharmonium produced complex tones by mixing sine wave harmonics from dozens of electrical tone generators. See, U.S. Pat. Nos. 580,035; 1,107,261; 1,213,803; and 1,295,691. The sinusoidal model is also the model used in most contemporary analysis-based music synthesis techniques, including pitch-synchronous analysis (J. C. Risset, et al., "Analysis of Musical Instrument Tones," Physics Today, vol. 22, no. 2, pp. 23-40 (1969)), synthesis heterodyne filter technique (J A Moorer, "On the Segmentation and Analysis of Continuous Musical Sound By Digital Computer," PhD Thesis, Stanford University (1975)), the phase vocoder (J. L. Flanagan et al., "Phase Vocoder," Bell System Tech. Journal (November 1966) and M. Dolson, "The Phase Vocoder: A Tutorial," Computer Music Journal, vol. 10, no. 4 (1986)), sinusoidal transformation system (STS) (R. J. McAulay et al., "Speech Analysis/Synthesis Based on a Sinusoidal Representation," IEEE Trans. On Acoustics, Speech and Signal Processing, vol. 34, pp. 744-754 (August 1986)), spectral modeling system (SMS) (X. Serra et al., "Spectral Modeling System: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition," Computer Music Journal, vol. 14, no. 4 (1990), and ABS/OLA (E. B. George et al., "Analysis-by-Synthesis/Overlap-Add Sinusoidal Modeling Applied to the Analysis and Synthesis of Musical Tones," Journal of Audio Engineering Society, vol. 40, no. 6, pp. 497-516 (1992)).
Despite the power of the sinusoidal model, modeling the music signal exclusively with sinusoids can lead to an "information explosion" due to the large number of sinusoidal components needed for modeling the "noisy" component in the original sound or/and the many harmonics in the low-pitched musical sounds. The large volume of analyzed parameters can be cumbersome for musicians to manipulate and can also cause difficulties and/or high cost for storage in a synthesizer.
Two approaches have been used to reduce the number of model parameters. One approach is described in J. M. Grey, "An Exploration of Musical Timbre," PhD Thesis, Stanford University (1975) and the R. J. McAulay et al. article, referenced above. That approach estimates the model parameters (such as amplitude and frequency) only at certain "break" points (frame boundaries) rather than at every sample point. The parameters are subsequently interpolated to all sample points at the synthesis stage. The other approach is described in the X. Serra et al. article, referenced above. That approach models the "noisy" part of the original sound other than with sine wave clusters. The latter approach is advantageous because it not only makes the signal model more parsimonious, thus removing some of the artificial tonal quality sometimes perceived in the synthesized sound when the noisy component is modeled by orderly sine wave clusters, but it also makes the signal model more accurate. This invention builds on both approaches.