The present invention relates to a method and apparatus for the synthesis of musical sounds. In particular, the present invention relates to a method and apparatus for the use of digital information to generate a natural sounding musical note over a range of pitches.
Since the development of the electronic organ, it has been recognized as desirable to create electronic keyboard musical instruments capable of imitating other accoustical instruments, i.e. strings, reeds, horns, etc. Early electronic music synthesizers attempted to acheive these goals using analog signal oscillators and filters. More recently, digital sampling keyboards have most successfully satisfied this need.
It has been recognized that notes from musical instruments may be decomposed into an excitation component and a broad spectral shaping outline called the formant. The overall spectrum of a note is equal to the product of the formant and the spectrum of the excitation. The formant is determined by the structure of the instrument, i.e. the body of a violin or guitar, or the shape of the throat of a singer. The excitation is determined by the element of the instrument which generates the energy of the sound, i.e. the string of a violin or guitar, or the vocal chords of a singer.
Workers in speech waveform coding have used formant/excitation analyses with radically different assumptions and objectives than music synthesis workers. For instance, for speech coding applications the required quality is lower than for musical applications, and the speech waveform coding is intended to efficiently represent a intelligible message. On the other hand, providing expression or the ability to manipulate the synthesis parameters in a musically meaningful way is very important in music. Changing the pitch of a synthesized signal is fundamental to performing a musical passage, whereas in speech synthesis the pitch of the synthesized signal is determined only by the input signal (the sender's voice). Furthermore, control and variation of the spectrum or amplitude of the synthesized signal is very important for musical applications to produce expression, while in speech synthesis such variations would be irrelevant and produce a degradation in the intellegibility of the signal.
Physical modelling approaches (see U.S. patent applications Ser. Nos. 766,848 and 859,868, filed Aug. 16, 1985 and May 2, 1986, respectively) attempt to model each individual physical component of acoustic instruments, and generate the waveforms from first principles. This process requires a detailed analysis of isolated subsystems of the actual instrument, such as modelling the clarinet reed with a polynomial, the clarinet body with a filter and delay line, etc.
Vocoding is a related technology that has been in use since the late 1930's primarily as a speech encoding method, but which has also been adapted for use as a musical special effect to produce unusual musical timbres. There have been no examples of the use Vocoding to de-munchkinize a musical signal after it has been pitch-shifted, although this should in principle be possible.
Digital sampling keyboards, in which a digital recording of a single note of an accoustic instrument is transposed, or pitch-shifted to create an entire keyboard range of sound have two major shortcomings. First, since a single recording is used to produce many notes by simply changing the playback speed, the audio spectrum of the recorded note is entirely shifted in pitch by the desired transposition. The consequence of this is that unnatural shifts in the formant shifts occur. This phenomenon is referred to in the industry as "munchkinization" after the strange voices of the munchkins in the classic movie "The Wizard of Oz", which were produced by this effect. It is also referred to as a "chipmunk" effect, after the voices of the children's television cartoon program called "The Chipmunks", which were also produced by increasing the playback rate of recorded voices. The second major shortcoming of pitch shifting is a lack of expressiveness. Expressiveness is considered a very important feature of traditional acoustical musical instruments, and when it is lacking, the instrument is considered to sound unpleasant or mechanical. Expressiveness is considered to have a deterministic and a stochastic component.
One current remedy for munchkinization is to limit the transposition range of a given recording. Separate recordings are used for different pitch ranges, thereby requiring greater memory requirements and producing problems in the matching of timbre of recordings across the keyboard.
The deterministic component of expression is associated with the non-random variation of the spectrum or transient details of the note as a function of user control input, such as pitch, velocity of keystroke, or other control input. For example, the sound generated from a violin is dependent on where the string is fretted, how the string is bowed, whether a vibrato effect is produced by "bending" the string, etc.
The stochastic component of expression is related to the random variations of the spectrum of the musical note so that no two successive notes are identical. The magnitude of these stochastic variations is not so great that the instrument is not identifiable.