Electronic music synthesizers have had difficulty capturing the sound and phrasing of expressive instruments such as violin, saxophone, and trumpet. Even traditional sampling synthesizers, which use actual recordings of real instruments, are unable to reassemble these recordings to form expressive phrases.
A traditional sampling synthesizer can be viewed as a system that stores in memory, a digitized recording of a highly constrained musical performance. The performance consists of a number of notes covering the pitch and intensity range of the instrument, separated by brief periods of silence. In response to a note_on command, with associated pitch and intensity values, the sampling synthesizer searches through the stored performance for the location of a note that most nearly matches the pitch and intensity associated with the note_on command. The recorded note is then read out of memory, further pitch-shifted, and amplitude scaled to achieve a more precise match with the desired pitch and intensity, and then output through a digital-to-analog converter.
Generally, three to four notes per octave with two to three intensity levels are stored in sampler memory. The amount of memory required is often quite large especially if a number of different instrumental sounds are desired. It is not practical to store very long note recordings-two to three seconds is typical. To synthesize long sustained notes, looping techniques are used. After playing the start of a recording, a segment of a note recording is played back repeatedly until the note is released. A relatively stable segment is chosen so that jumping from the end to the beginning of the segment does not introduce obvious discontinuities. Sometimes the discontinuity associated with the loop jump is smoothed over by cross-fading from the end to the beginning of the loop segment.
For expressive instruments, the traditional sampling synthesizer often sounds unnatural, like a succession of unrelated notes rather than a musical phrase. Sustained notes often have an undesirable periodic pulsation due to looping. When the loop segment is extremely short--e.g. one pitch period--the result sounds like an electronic oscillator rather than a natural instrument.
The reason for the failure to synthesize expressive phrases is that, for expressive instruments such as trumpet, violin and saxophone, real performances are not simply the concatenation of a number of isolated notes. Complex, idiosyncratic behavior occurs in the transition from one note to the next. This behavior during note transitions is often the most characteristic and identifiable aspect of instrumental sounds.
Various attempts have been made to enrich the kinds of note transitions generated by traditional synthesizers. U.S. Pat. No. 4,083,283, to Hiyoshi et al., teaches a system where, for a smooth slurred transition between notes, the amplitude envelope is held constant during the transition, whereas the envelope will begin with an attack segment for non-slurred transitions. U.S. Pat. No. 5,216,189, to Kato, teaches a system where amplitude and pitch envelopes are determined by certain note transition values, for example, pitch difference between successive notes. U.S. Pat. No. 4,332,183, to Deutch, teaches a system where the Attack-Decay-Sustain-Release (ADSR) amplitude envelope of a tone is determined by the time delay between the end of the preceding tone and the start of the tone to which the ADSR envelope is to be applied. U.S. Pat. No. 4,524,668, to Tomisawa et al., teaches a system where a slurred transition between notes can be simulated by generating a smooth transition from the pitch and amplitude of a preceding tone to the pitch and amplitude of a following tone. U.S. Pat. No. 4,726,276, to Katoh et al., teaches a system where, for a slurred transition between notes, pitch is smoothly changed between notes, and a stable tone color is produced during the attack of the second tone, whereas a rapidly changing tone color is produced during the attack of the second tone of a non-slurred transition. Katoh et al. also teaches the detection of slurred tones from an electronic keyboard by detecting the depression of a new key before the release of a preceding key. U.S. Pat. No. 5,292,995, to Usa, teaches a system, where a fuzzy operation is used to generate a control signal for a musical tone based on the time lapse between one note_on command and the next. U.S. Pat. No. 5,610,353, to Hagino, teaches a system where a slurred keyboard performance is detected based on a second key depression before a preceding key has been released, and where sampled tones stored in memory have two start addresses: a normal start address and a slur start address. The slur start address is presumably offset into the sustained part of the tone. On detection of legato, a new tone is started at the slur start address.
All of these inventions attempt to provide smooth transitions for slurs by artificially manipulating the data associated with isolated note recordings: starting a note after its recorded attack, reducing the importance of an attack by superimposing a smooth amplitude envelope, etc. None of these techniques captures the dynamics of the natural instrument in slurred phrasing, let alone the wide variety of non-slurred note transition types present in an expressive instrumental performance.
In addition, none of these inventions addresses the problem of generating natural sustains without the periodic pulsing or electronic oscillator sound found with traditional looping techniques.