The present invention generally relates to a method of and an apparatus for analyzing and synthesizing a sound, and more particularly to various improvements for a musical synthesizer employing a spectral modeling synthesis technique.
A prior art musical synthesizer employing a spectral modeling synthesis technique (hereafter referred to as is disclosed in "A System for Sound Analysis/Transformation/Synthesis based on a Deterministic plus Stochastic Decomposition" Ph. D. Dissertation, Stanford University, written by Xavier Serra, one of the co-inventors of the present application and published in October, 1989. Such a prior musical synthesizer is also disclosed in U.S. Pat. No. 5,029,509 describing an invention by Xavier Serra entitled "Musical Synthesizer Combining Deterministic and Stochastic Waveforms", as well as in PCT International Publication No. W090/13887 corresponding to this U.S. Patent.
The SMS technique is a musical sound analysis/synthesis technique utilizing a model which assumes that a sound is composed of two types of components, namely, a deterministic component and a stochastic component. The deterministic component is represented by a series of sinusoids and has amplitude and magnitude functions for each sinusoid; that is, the deterministic component is a spectral component having deterministic amplitudes and frequencies. The stochastic component is, on the other hand, represented by magnitude spectral envelopes. The stochastic component is, for example, defined as residual spectra represented in spectral envelopes which are obtained by subtracting the deterministic spectra from the spectra of an original waveform. The sound analysis/synthesis is performed for each time frame during a sequence of time frames.
Analyzed data for each time frame is represented by a set of sound partials each having a specific frequency value and a specific amplitude value as follows:
an (.iota.), fn(.iota.) for EQU n=0, . . . , N-1 PA1 em(.iota.) for EQU m=0, . . . , M-1 (Expression 1)
where f represents a specific frame, an(.iota.) and fn(.iota.) represent the amplitude and frequency, respectively, of every sound partial (in this specification, also referred to as "partial") at frame .iota. which correspond to deterministic component. N is the number of sound partials at that frame. em(.iota.) represents a spectral envelope corresponding to the stochastic component, m is the breakpoint number, and M is the number of breakpoints at that frame.
Such a musical sound synthesis based on the SMS technique is advantageous in that it can synthesize a sound waveform of extremely high quality by the use of compressed analysis data. Further, it has a potentiality to create a wide variety of new sounds in response to the user's free controls over the analysis data used for the sound synthesis. Therefore, in the musical sound synthesis based on the SMS technique, there has been an increasing demand for establishing a concrete method applicable to various musical controls.
A technique is also well-known in the art which obtains spectral data of sound partials by analyzing an original sound waveform by means of the Fourier transformation or other suitable technique, stores the obtained spectral data in a memory, and then synthesizes a sound waveform by the inverse-Fourier transformation of the sound partial spectral data as read out from the memory. However, the conventionally-known sound partial synthesis technique is nothing but a mere synthesis technique and never employs an analytical approach for controlling the musical characteristics of a sound to be synthesized.
One of the technical problems encountered in the prior art music synthesizers is how to synthesize human voice. Many of the conventionally-known techniques for synthesizing vocal sounds are based on a vocal model; that is, they are based on passing an excitation signal through a time-varying filter. However, such a model can not generate a high-quality sound and has a poor flexibility. Further, the majority of the prior art vocal sound synthesis techniques are not based on analysis but a mere synthesis technique. In other words, they can not model a given singer. Moreover, the prior art techniques provided no method for removing a vibrato from recorded singer's voice.