1. Field of the Invention
The present invention relates to controlling distortion in reproduced digital data, and more particularly, to removing distortion in wavetable based synthesizers.
2. Description of the Related Art
The creation of musical sounds using electronic synthesis methods dates back at least to the late nineteenth century. From these origins of electronic synthesis until the 1970's, analog methods were primarily used to produce musical sounds. Analog music synthesisers became particularly popular during the 1960's and 1970's with developments such as the analog voltage controlled patchable analog music synthesiser, invented independently by Don Buchla and Robert Moog. As development of the analog music synthesiser matured and its use spread throughout the field of music, it introduced the musical world to a new class of timbres.
However, analog music synthesisers were constrained to using a variety of modular elements. These modular elements included oscillators, filters, multipliers and adders, all interconnected with telephone style patch cords. Before a musically useful sound could be produced, analog synthesizers have to be programmed by first establishing an interconnection between the desired modular elements and then laboriously adjusting the parameters of the modules by trial and error. Because the modules used in these synthesisers tended to drift with temperature change, it was difficult to store parameters and faithfully reproduce sounds from one time to another time.
Around the same time that analog musical synthesis was coming into its own, digital computing methods were being developed at a rapid pace. By the early 1980's, advances in computing made possible by Very Large Scale Integration (VLSI) and digital signal processing (DSP) enabled the development of practical digital based waveform synthesisers. Since then, the declining cost and decreasing size of memories have made the digital synthesis approach to generating musical sounds a popular choice for use in personal computers and electronic musical instrument applications.
One type of digital based synthesiser is the wavetable synthesiser. The wavetable synthesiser is a sampling synthesiser in which one or more musical instruments are “sampled,” by recording and digitizing a sound produced by the instrument(s), and storing the digitized sound into a memory. The memory of a wavetable synthesizer includes a lookup table in which the digitized sounds are stored as digitized waveforms. Sounds are generated by “playing back” from the wavetable memory, to a digital-to-analog converter (DAC), a particular digitized waveform.
The basic operation of a sampling synthesiser is to playback digitized recordings of entire musical instrument notes under the control of a person, computer or some other means. Playback of a note can be triggered by depressing a key on a musical keyboard, from a computer, or from some other controlling device. While the simplest samplers are only capable of reproducing one note at a time, more sophisticated samplers can produce polyphonic (multi-tone), multi-timbral (multi-instrument) performances.
Data representing a sound in a wavetable memory are created using an analog-to-digital (ADC) converter to sample, quantize and digitize the original sound at a successive regular time interval (i.e., the sampling interval, Ts). The digitally encoded sound is stored in an array of wavetable memory locations that are successively read out during a playback operation.
One technique used in wavetable synthesizers to conserve sample memory space is the “looping” of stored sampled sound segments A looped sample is a short segment of a wavetable waveform stored in the wavetable memory that is repetitively accessed (e.g., from beginning to end) during playback. Looping is particularly useful for playing back an original sound or sound segment having a fairly constant spectral content and amplitude. A simple example of this is a memory that stores one period of a sine wave such that the endpoints of the loop segment are compatible (i.e., at the endpoints the amplitude and slope of the waveform match to avoid a repetitive “glitch” that would otherwise be heard during a looped playback of an unmatched segment). A sustained note may be produced by looping the single period of a waveform for the desired length of duration time (e.g., by depressing the key for the desired length, programming a desired duration time, etc.). However, in practical applications, for example, for an acoustic instrument sample, the length of a looped segment would include many periods with respect to the fundamental pitch of the instrument sound. This avoids the “periodicity” effect of a looped single period waveform that is easily detectable by the human ear and improves the perceived quality of the sound (e.g., the “evolution” or “animation” of the sound).
The sounds of many instruments can be modeled as consisting of two major sections: the “attack” (or onset) section and the “sustain” section. The attack section is the initial part of a sound, wherein amplitude and spectral characteristics of the sound may be rapidly changing. For example, the onset of a note may include a pick snapping a guitar string, the chiff of wind at the start of a flute note, or a hammer striking the strings of a piano. The sustain section of the sound is that part of the sound following the attack, wherein the characteristics of the sound are changing less dynamically. A great deal of memory is saved in wavetable synthesis systems by storing only a short segment of the sustain section of a waveform, and then looping this segment during playback.
Amplitude changes that are characteristic of a particular or desired sound may be added to a synthesized waveform signal by multiplying the signal with a decreasing gain factor or a time varying envelope function. For example, for an original acoustic string sound, signal amplitude variation naturally occurs via decay at different rates in various sections of the sound. In the onset of the acoustic sound (i.e., in the attack part of the sound), a period of decay may occur shortly after the initial attack section. A period of decay after a note is “released” may occur after the sound is terminated (e.g., after release of a depressed key of a music keyboard). The spectral characteristics of the acoustic sound signal may remain fairly constant during the sustain section of the sound, however, the amplitude of the sustain section also may (or may not) decay slowly. The forgoing describes a traditional approach to modeling a musical sound called the Attack-Decay-Sustain-Release (ADSR) model, in which a waveform is multiplied with a piecewise linear envelope function to simulate amplitude variations in the original sounds.
In order to minimize sample memory requirements, wavetable synthesis systems have utilized pitch shifting, or pitch transposition techniques, to generate a number of different notes from a single sound sample of a given instrument. Two types of methods are mainly used in pitch shifting: asynchronous pitch shifting and synchronous pitch shifting.
In asynchronous pitch shifting, the clock rate of each of the DAC converters used to reproduce a digitized waveform is changed to vary the waveform frequency, and hence its pitch. In systems using asynchronous pitch shifting, it is required that each channel of the system have a separate DAC. Each of these DACs has its own clock whose rate is determined by the requested frequency for that channel. This method of pitch shifting is considered asynchronous because each output DAC runs at a different clock rate to generate different pitches. Asynchronous pitch shifting has the advantages of simplified circuit design and minimal pitch shifting artifacts (as long as the analog reconstruction filter is of high quality). However, asynchronous pitch shifting methods have several drawbacks. First, a DAC would be needed for each channel, which increases system cost with increasing channel count. Another drawback of asynchronous pitch shifting is the inability to mix multiple channels for further digital post processing such as reverberation. Asynchronous pitch shifting also requires the use of complex and expensive tracking reconstruction filters—one for each channel—to track the sample playback rate for the respective channels.
In synchronous pitch shifting techniques currently being utilized, the pitch of the wavetable playback data is changed using sample rate conversion algorithms. These techniques accomplish sample rate conversion essentially by accessing the stored sample data at different rates during playback. For example, if a pointer is used to address the sample memory for a sound, and the pointer is incremented by one after each access, then the samples for this sound would be accessed sequentially, resulting in some particular pitch. If the pointer increment is two rather than one, then only every second sample would be played, and the resulting pitch would be shifted up by one octave (i.e., the frequency would be doubled). Thus, a pitch may be adjusted to an integer number of higher octaves by multiplying the index, n, of a discrete time signal x[n] by a corresponding integer amount a and playing back (reconstructing) the signal xup[n] at a “resampling rate” of an:xup[n]=x[an].
To shift downward in pitch, additional “sample” points (e.g., one or more zero values) are introduced between values of the decoded sequential data of the stored waveform. That is, a discrete time signal x[n] may be supplemented with additional values in order to approximate a resampling of the continuous time signal x(t) at a rate that is increased by a factor L:xdown[n]=x[n/L],n=0,±L,±2L, ±3L, . . . ; otherwise,xdown[n]=0.When the resultant sample points, xdown[n], are played back at the original sampling rate, the pitch will have been shifted downward.
While the foregoing illustrates how the pitch may be changed by scaling the index of a discrete time signal by an integer amount, this allows only a limited number of pitch shifts. This is because the stored sample values represent a discrete time signal, x[n], and a scaled version of this signal, x[an] or x[n/b], cannot be defined with a or b being non integers. Hence, more generalized sample rate conversion methods have been developed to allow for more practical pitch shifting increments, as described in the following.
In a more general case of sample rate conversion, the sample memory address pointer would consist of an integer part and a fractional part, and thus the increment value could be a fractional number of samples. The memory pointer is often referred to as a “phase accumulator” and the increment value is called the “phase increment.” The integer part of the phase accumulator is used to address the sample memory and the fractional part is used to maintain frequency accuracy.
Different algorithms for changing the pitch of a tabulated signal that allow fractional increment amounts have been proposed. See, for example, M. Kahrs et al., “Applications of Digital Signal Processing to Audio and Acoustics,” 1998, pp. 311-341, the entire contents of which is incorporated herein by reference. One sample rate conversion technique disclosed in Kahrs et al. and currently used in computer music is called “drop sample tuning” or “zero order hold interpolator,” and is the basis for the table lookup phase increment oscillator. The basic element of a table lookup oscillator is a wavetable, which is an array of memory locations that store the sampled values of waveforms to be generated. Once a wavetable is generated, a stored waveform may be read out using an algorithm, such as a drop sample tuning algorithm, which is described in the following.
First, assume that pre-computed values of the waveform may be stored in a wavetable denoted x, where x[n] refers to the value stored at location n of the wavetable. A variable Cph is defined as representing the current offset into the waveform and may have both an integer part and a fractional part. The integer part of the Cph variable is denoted as └Cph┘. (The notation └z┘ is used herein to denote the integer part of a real number z.) Next, let x[n], n=1, 2, . . . , BeginLoop−1, BeginLoop, BeginLoop+1, . . . , EndLoop−1, EndLoop, EndLoop+1, . . . , EndLoop+L be the tabulated waveform, and PitchDeviation be the amount of frequency the signal x[n] has to be shifted given in unit “cents.”
A cent has its basis in the chromatic scale (used in most western music) and is an amount of a shift in pitch of a musical note (i.e., a relative change from a note's “old” frequency, fold, to a “new” frequency,fnew). The chromatic scale is divided into octaves, and each octave, in turn, is divided into twelve steps (notes), or halftones. To move up an octave (+12 halftones) from a note means doubling the old frequency, and to move down an octave (−12 halftones) means halving the old frequency. When viewed on a logarithmic frequency scale, all the notes defined in the chromatic scale are evenly located. (One can intuitively understand the logarithmic nature of frequency in the chromatic scale by recalling that a note from a vibrating string is transposed to a next higher octave each time the string length is halved, and is transposed to a next lower octave by doubling the string length.) This means that the ratio between the frequencies of any two adjacent notes (i.e., halftones) is a constant, say c. The definition of an octave causes c12=2, so that c=21/12=1.059463. It is usually assumed that people can hear pitch tuning errors of about one “cent,” which is 1% of a halftone, so the ratio of one cent would be 21/1200. For a given signal having a frequency fold, if it is desired to shift fold to a new frequency fnew, the ratio of fnew/fold=2cents/1200 (note that 1200 cents would correspond to a shift up of one octave, −2400 cents would correspond to a shift down of 2 octaves, and so on). It follows that a positive value of cents indicates that fnew is higher than fold (i.e., an upwards pitch shift), and that a negative value of cents indicates that fnew is lower than fold (i.e., an downwards pitch shift).
The output of the drop sample tuning algorithm is y[n], and is generated from inputs x[n] and PitchDeviation, as follows:       Drop  Sample  Tuning  Algorithm              Start      :                          ⁢                           ⁢      PhaseIncrement        =          2              PitchDeviation        /        1200                                 ⁢          Cph      =              -        PhaseIncrement                        Loop      :                          ⁢                           ⁢      Cph        =          {                                                                                                        Cph                    +                    PhaseIncrement                                    ,                                                                              Cph                  ≤                  EndLoop                                                                                    BeginLoop                                            else                                              ⁢                                          ⁢                                           ⁢                      y            ⁡                          [              n              ]                                      =                  x          ⁡                      [                          ⌊              Cph              ⌋                        ]                              The algorithm output, y[n], is the value of x[.] indexed by the integer part of the current value of the variable Cph each time it cycles through the loop part of the algorithm. For example, with PhaseIncrement=1.0 (PitchDeviation=0 cents), each sample for the wavetable is read out in turn (for the duration of the loop), so the waveform is played back at its original sampling rate. With PhaseIncrement=0.5 (PitchDeviation=−1200 cents), the waveform is reproduced one octave lower in pitch. With PhaseIncrement=2.0 (PitchDeviation=1200 cents), the waveform is pitch shifted up by one octave, and every other sample is skipped. Thus, the sampled values of x[n] are “resampled” at a rate that corresponds to the value of PhaseIncrement. This resampling of x[n] is commonly referred to as “drop sample tuning,” because samples are either dropped or repeated to change the frequency of the oscillator.
When PhaseIncrement is less than 1, the pitch of the generated signal is decreased. In principal, this is achieved by an up sampling of the wave data, but sustaining a fixed sampling rate for all outputs. Drop sample tuning “upsamples” x[n] and is commonly referred to as a “sampling rate expander” or an “interpolator” because the sample rate relative to the sampling rate used to form x[n] is effectively increased. This has the effect of expanding the signal in discrete time, which has the converse effect of contracting the spectral content of the original discrete time signal x[n].
When PhaseIncrement is greater than 1, the pitch of the signal is increased. In principal, this is achieved by a down sampling of the wave data, while sustaining a fixed sampling rate for all outputs. x[n] is commonly referred to as being “downsampled” or “decimated” by the drop sample tuning algorithm because the sampling rate is effectively decreased. This has the effect of contracting the signal in the time domain, which conversely expands the spectral content of the original discrete time signal x[n].
The drop sample tuning method of pitch shifting introduces undesirable distortion to the original sampled sound, which increases in severity with an increasing pitch shift amount. For example, if the pitch-shifting amount PhaseIncrement exceeds 1 and the original signal x(t) was sampled at the Nyquist rate, spectral overlapping of the downsampled signal will occur in the frequency domain and the overlapped frequencies will assume some other frequency values. This irreversible process is known as aliasing or spectral folding. A waveform signal that is reconstructed after aliasing has occurred will be distorted and not sound the same as a pitch-shifted version original sound. Aliasing distortion may be reduced by sampling the original sound at a frequency (fs=1/Ts) that is much greater than the Nyquist rate such that the original sound is “over-sampled.” However, over-sampling would require an increase in memory, which is undesirable in most practical applications.
To reduce the amount of aliasing distortion, interpolation techniques have been developed to change the sample rate. Adding interpolation in a sample rate conversion method changes the calculation of the lookup table by creating new samples based on adjacent sample values. That is, instead of ignoring the fractional part of the address pointer when determining the value to be sent to the DAC (such as in the foregoing drop sample algorithm), interpolation techniques perform a mathematical interpolation between available data points in order to obtain a value to be used in playback. The following algorithm illustrates a two point interpolation technique currently used in many sampling synthesizers to shift the pitch of a tabulated wavetable wave x[n]:       Linear  Interpolation  Algorithm              Start      :                          ⁢                           ⁢      PhaseIncrement        =          2              PitchDeviation        /        1200                                 ⁢          Cph      =              -        PhaseIncrement                        Loop      :                          ⁢                           ⁢      Cph        =          {                                                                                                        Cph                    +                    PhaseIncrement                                    ,                                                                              Cph                  ≤                  EndLoop                                                                                    BeginLoop                                            else                                              ⁢                                          ⁢                                           ⁢                      y            ⁡                          [              n              ]                                      =                                            x              ⁡                              [                                  ⌊                  Cph                  ⌋                                ]                                      ·                          (                              1                -                Cph                +                                  ⌊                  Cph                  ⌋                                            )                                +                                    x              ⁡                              [                                                      ⌊                    Cph                    ⌋                                    +                  1                                ]                                      ·                                          (                                  Cph                  -                                      ⌊                    Cph                    ⌋                                                  )                            .                                          
While interpolation methods reduce aliasing distortion to some extent when pitch-shifting wavetable waveforms, interpolation nevertheless introduces distortion that increases in severity as the sampling rate of the original waveform x(t) approaches (or falls below) the Nyquist rate. As with simple drop sample tuning, interpolation methods can more accurately represent the pitch-shifted version of the original sound if the Nyquist rate is greatly exceeded when creating x[n]. However, the tradeoff in doing so would necessarily require an increase in memory to store the corresponding increase in the number of wavetable samples. Higher order polynomial interpolation techniques may be used to further reduce aliasing distortion, but these techniques are computationally expensive. Thus, there is a need in the art for new ways of reducing distortion when tones listed in a wavetable are transposed without requiring a high levels of computation complexity and sample memory space.