The present invention relates generally to a system and method of synthesizing music, and particularly to a new music synthesis technique, herein called xe2x80x9cscanned synthesis,xe2x80x9d that is intuitive and produces pleasing sounds with very little user training.
There are a number of well established electronic music synthesis methodologies. For instance, wave tables are used in many music synthesis systems, with the frequency of each voice being determined by a rate at which values in the table are converted into output signals. Some music synthesis systems use frequency modulation techniques, others use digital filters to process input signals, and yet others use a variety of xe2x80x9cphysical modelsxe2x80x9d that are simulated using various techniques.
In wave table based music synthesis, the shape of the audio waveform is governed by the waveform stored in a table. Typically the values stored in the wave table are fixed. For instance, the values in the table may be set equal to the sine or cosine of a function of the index for each entry in the table.
xe2x80x9cScanned synthesis,xe2x80x9d which is the name given by its inventors to the new music synthesis technique described in this document, is based not only on the psychoacoustics of how we hear and appreciate musical timbres, but also on our haptic abilities to shape and control timbres during live (real time) performance. Scanned synthesis places an emphasis on intuitive human control of timbre during real time performance, while most other synthesis techniques have given little attention to the control aspects of performance.
The sampling theorem guarantees that any sound the human ear can hear can be synthesized from a sufficient quantity of digital samples of the time function of the sound pressure. However, early results produced by digital synthesis in the 1960""s shows that much needed to learned about how to generate digital samples corresponding to musically rich and pleasing timbres. At that time, human hearing was well enough understood. For instance, it was understood that the frequency spectrum was a better characterizer of timbre than the time function. We also knew that the important audio frequencies lie in the range of about 50 to 10,000 hertz. But efforts to digitally simulate traditional musical timbres using sound waves with fixed (unchanging with time) spectra were discouraging.
In the mid-1960""s, Jean-Claude Risset demonstrated that good simulations of traditional instruments could be made with sounds in which the spectrum changed with time over the course of each note. In a brass timbre, the proportion of high frequency energy in the spectrum must increase as the intensity of the sound increases at the beginning (attack part) of the note. By contrast, for bells and most percussive instruments, high frequency overtones decay faster than low frequency overtones, so the proportion of high frequency energy is greatest at the beginning of the note. There is, however, an interesting exception to this rule. Nonlinearities in a Chinese gong, because it has a sharply bent edge, convert low frequency overtone energy into higher frequency energy, thus causing high frequencies first to build up and then eventually decay.
Many extensions to Risset""s work have led to a better understanding of the properties of spectral time variations that the ear hears and the brain likes.
Spectral time variations can also be usefully characterized by their frequency spectrum. These frequencies are much lower, typically 0 to about 15 hertz, than audio frequencies (about 50 to 10,000 hertz). The upper limit is 15 because variations above 15 hertz often sound unpleasant.
At present, the terminology used to describe spectral time variations is not well established. Some kinds of spectral time variations, particularly vibrato and tremolo, are called modulations. But other kinds, such as occur in brass and bell sounds are unnamed. We, the inventors of the present invention, here propose the name xe2x80x9chaptic frequenciesxe2x80x9d to characterize at least a class of these variations.
The inventors have observed that either by happy accident of nature or because of the way human beings are built, the frequency range of spectral changes the ear can understand is the same as the frequency range of body part (arms, fingers, etc.) movements that we can consciously control. Scanned synthesis provides methods for directly manipulating the spectrum of a sound by human movements.
Most traditional instruments use resonances of some sort to create sounds. The resonances may be of an air column, or a string, or a membrane or a plate. A successful instrument usually must have many resonances. In all cases, the resonant frequencies must lie somewhere in the audio frequency band in order to be heard. The ratio between the resonant frequencies and the haptic frequencies (rate of spectral changes) depends on the narrowness of the resonant peaks of the instrument, otherwise known as the Q of the resonances. For physical objects, Q depends mostly on energy losses in the material from which they are made. It is difficult to change the haptic frequencies of an instrument. It is also difficult to directly manipulate the spectrum by motions of the performer""s body.
In a music synthesis system, using the scanned synthesis technique of the present invention, a scanning apparatus repeatedly scans a physical attribute of a vibrating object at a sequence of points on or in the vibrating object so as to repeatedly generate corresponding sequences of values. The music synthesis system generates an audio frequency waveform whose shape corresponds to the sequences of values. The vibrating object may be a physical object or a simulated object.
Examples of the physical attribute that is scanned include a position coordinate, a velocity, an acceleration, a third derivative of a position coordinate, a fourth derivative of a position coordinate, a linear combination of at least two of the position, velocity, acceleration, third derivative and fourth derivative, and a non-linear combination of at least two of the position, velocity, acceleration, third derivative and fourth derivative.
A user interface may be used to receive user input, and the vibrating object may be stimulated in accordance with the user input. For instance, a portion of the vibrating object may be displaced in response to the user input, or the initial shape or energy state of the object may be set in response to the user input. The user interface may include a sensor for receiving the user input, and means for mapping the user input into a stimulus signal that is applied to the vibrating physical object. Examples of the user interface sensor include a keyboard, a set of one or more foot pedals, a set of one or more position sensors, an audio microphone, a set of one or more pressure sensors, and any combination thereof.
In the music synthesis method of the present invention, the shape of a waveform is continuously updated based on either a physical attribute of a vibrating object (having a time varying shape or state), or a physical attribute of a simulated vibrating object. User inputs affect the evolving shape (or state) of the real or simulated vibrating physical object. User inputs can also affect other aspects of the music synthesis process, such as varying the rate at which the object is scanned, and varying the trajectory of points scanned. User inputs may also be used to select the attribute of the object that is being scanning.