The present invention relates to the generation of sounds by means of a wavetable synthesizer, and more particularly to the control of the processing load imposed by a wavetable synthesizer.
The creation of musical sounds using electronic synthesis methods dates back at least to the late nineteenth century. From these origins of electronic synthesis until the 1970's, analog methods were primarily used to produce musical sounds. Analog music synthesizers became particularly popular during the 1960's and 1970's with developments such as the analog voltage controlled patchable analog music synthesizers, invented independently by Don Buchla and Robert Moog. As development of the analog music synthesizer matured and its use spread throughout the field of music, it introduced the musical world to a new class of timbres.
However, analog music synthesizers were constrained to using a variety of modular elements. These modular elements included oscillators, filters, multipliers and adders, all interconnected with telephone style patch cords. Before a musically useful sound could be produced, analog synthesizers had to be programmed by first establishing an interconnection between the desired modular elements and then laboriously adjusting the parameters of the modules by trial and error. Because the modules used in these synthesizers tended to drift with temperature change, it was difficult to store parameters and faithfully reproduce sounds from one time to another time.
Around the same time that analog musical synthesis was coming into its own, digital computing methods were being developed at a rapid pace. By the early 1980's, advances in computing made possible by Very Large Scale Integration (VLSI) and digital signal processing (DSP) enabled the development of practical digital based waveform synthesizers. Since then, the declining cost and decreasing size of memories have made the digital synthesis approach to generating musical sounds a popular choice for use in personal computers and electronic musical instrument applications.
One type of digital based synthesizer is the wavetable synthesizer. The wavetable synthesizer is a sampling synthesizer in which one or more real musical instruments are “sampled,” by recording and digitizing a sound produced by the instrument(s), and storing the digitized sound into a memory. The memory of a wavetable synthesizer includes a lookup table in which the digitized sounds are stored as digitized waveforms. Sounds are generated by “playing back” from the wavetable memory, to a digital-to-analog converter (DAC), a particular digitized waveform.
The basic operation of a sampling synthesizer is to playback digitized recordings of entire musical instrument notes under the control of a person, computer or some other means. Playback of a note can be triggered by depressing a key on a musical keyboard, from a computer, or from some other controlling device. When it is desired to store a particular sequence of desired musical events that are to be rendered by a sampling synthesizer, a standard control language, such as the Musical Instrument Digital Interface (MIDI), may be used. While the simplest samplers are only capable of reproducing one note at a time, more sophisticated samplers can produce polyphonic (multi-tone), multi-timbral (multi-instrument) performances.
Data representing a sound in a wavetable memory may be created using an analog-to-digital converter (ADC) to sample, quantize and digitize the original sound at a successive regular time interval (i.e., the sampling interval, TS). The digitally encoded sound is stored in an array of wavetable memory locations that are successively read out during a playback operation.
One technique used in wavetable synthesizers to conserve sample memory space is the “looping” of stored sampled sound segments. A looped sample is a short segment of a wavetable waveform stored in the wavetable memory that is repetitively accessed (e.g., from beginning to end) during playback. Looping is particularly useful for playing back an original sound or sound segment having a fairly constant spectral content and amplitude. A simple example of this is a memory that stores one period of a sine wave such that the endpoints of the loop segment are compatible (i.e., at the endpoints the amplitude and slope of the waveform match to avoid a repetitive “glitch” that would otherwise be heard during a looped playback of an unmatched segment). A sustained note may be produced by looping the single period of a waveform for the desired length of duration time (e.g., by depressing the key for the desired length, programming a desired duration time, etc.). However, in practical applications, for example, for an acoustic instrument sample, the length of a looped segment would include many periods with respect to the fundamental pitch of the instrument sound. This avoids the “periodicity” effect of a looped single period waveform that is easily detectable by the human ear, and improves the perceived quality of the sound (e.g., the “evolution” or “animation” of the sound).
The sounds of many instruments can be modeled as consisting of two major sections: the “attack” (or onset) section and the “sustain” section. The attack section is the initial part of a sound, wherein amplitude and spectral characteristics of the sound may be rapidly changing. For example, the onset of a note may include a pick snapping a guitar string, the chiff of wind at the start of a flute note, or a hammer striking the strings of a piano. The sustain section of the sound is that part of the sound following the attack, wherein the characteristics of the sound are changing less dynamically. A great deal of memory is saved in wavetable synthesis systems by storing only a short segment of the sustain section of a waveform, and then looping this segment during playback.
Amplitude changes that are characteristic of a particular or desired sound may be added to a synthesized waveform signal by multiplying the signal with a decreasing gain factor or a time varying envelope function. For example, for an original acoustic string sound, signal amplitude variation naturally occurs via decay at different rates in various sections of the sound. In the onset of the acoustic sound (i.e., in the attack part of the sound), a period of decay may occur shortly after the initial attack section. A period of decay after a note is “released” may occur after the sound is terminated (e.g., after release of a depressed key of a music keyboard). The spectral characteristics of the acoustic sound signal may remain fairly constant during the sustain section of the sound, however, the amplitude of the sustain section also may (or may not) decay slowly. The forgoing describes a traditional approach to modeling a musical sound called the Attack-Decay-Sustain-Release (ADSR) model, in which a waveform is multiplied with a piecewise linear envelope function to simulate amplitude variations in the original sounds.
In order to minimize sample memory requirements, wavetable synthesis systems have utilized pitch shifting, or pitch transposition techniques, to generate a number of different notes from a single sound sample of a given instrument. Two types of methods are mainly used in pitch shifting: asynchronous pitch shifting and synchronous pitch shifting.
In asynchronous pitch shifting, the clock rate of each of the DAC converters used to reproduce a digitized waveform is changed to vary the waveform frequency, and hence its pitch. In systems using asynchronous pitch shifting, each channel of the system is required to have a separate DAC. Each of these DACs has its own clock whose rate is determined by the requested frequency for that channel. This method of pitch shifting is considered asynchronous because each output DAC runs at a different clock rate to generate different pitches. Asynchronous pitch shifting has the advantages of simplified circuit design and minimal pitch shifting artifacts (as long as the analog reconstruction filter is of high quality). However, asynchronous pitch shifting methods have several drawbacks. First, a DAC would be needed for each channel, which increases system cost with increasing channel count. Another drawback of asynchronous pitch shifting is the inability to mix multiple channels for further digital post processing such as reverberation. Asynchronous pitch shifting also requires the use of complex and expensive tracking reconstruction filters-one for each channel-to track the sample playback rate for the respective channels.
In synchronous pitch shifting techniques currently being utilized, the pitch of the wavetable playback data is changed using sample rate conversion algorithms. These techniques accomplish sample rate conversion essentially by generating, from the stored sample points, a different number of sample points which, when accessed at a standard clock rate, generate the desired pitch during playback. For example, if sample memory accesses occur at a fixed rate, and if a pointer is used to address the sample memory for a sound, and the pointer is incremented by one after each access, then the samples for this sound would be accessed sequentially, resulting in some particular pitch. If the pointer increment is two rather than one, then only every second sample would be played (i.e., the effective number of samples is cut in half), and the resulting pitch would be shifted up by one octave (i.e., the frequency would be doubled). Thus, a pitch may be adjusted to an integer number of higher octaves by multiplying the index, n, of a discrete time signal x[n] by a corresponding integer amount a and playing back (reconstructing) the signal xup[n] at a “resampling rate” of a·n:xup[n]=x[a·n]
To shift downward in pitch, it is necessary to expand the number of samples from the number actually stored in the sample memory. To accomplish this, additional “sample” points (e.g., one or more zero values) may be introduced between values of the decoded sequential data of the stored waveform. That is, a discrete time signal x[n] may be supplemented with additional values in order to approximate a resampling of the continuous time signal x(t) at a rate that is increased by a factor L:xdown[n]=x[n/L], n=0, ±L, ±2L, ±3L, . . . ; otherwise, xdown[n]=0.When the resultant sample points, xdown[n], are played back at the original sampling rate, the pitch will have been shifted downward.
While the foregoing illustrates how the pitch may be changed by scaling the index of a discrete time signal by an integer amount, this allows only a limited number of pitch shifts. This is because the stored sample values represent a discrete time signal, x[n], and a scaled version of this signal, x[a·n] or x[n/b], cannot be defined with a or b being non-integers. Hence, more generalized sample rate conversion methods have been developed to allow for more practical pitch shifting increments, as described in the following.
In a more general case of sample rate conversion, the sample memory address pointer would consist of an integer part and a fractional part, and thus the increment value could be a fractional number of samples. The memory pointer is often referred to as a “phase accumulator” and the increment value is called the “phase increment.” The integer part of the phase accumulator is used to address the sample memory and the fractional part is used to maintain frequency accuracy.
Different algorithms for changing the pitch of a tabulated signal that allow fractional increment amounts have been proposed. One category of such algorithms involves the use of interpolation to generate a synthesized sample point from the actually stored adjacent sample points when the memory pointer points to an address that lies between two actual memory locations. That is, instead of ignoring the fractional part of the address pointer when determining the value to be sent to the DAC (such as in the known “drop sample algorithm”), interpolation techniques perform a mathematical interpolation between available data points in order to obtain a value to be used in playback. It is well-known that the optimum interpolator uses a sin(x)/x function and that such an interpolator is non-causal and requires an infinite number of calculations. Consequently, sub-optimal interpolation methods have been developed. A sub-optimal interpolation generates distortion (artifacts) due to a portion of the signal being folded back at the Nyquist frequency Fs/2 (Fs being the sampling rate used when the table sequence was recorded). This distortion is perceived as annoying and has to be controlled.
The interpolation degree, defined as the number of wavetable samples used in the interpolation, is a parameter that sets the performance of the synthesizer. The higher the degree that is used, the lower the distortion present in the generated signal. However, a high interpolation degree costs complexity. For example, the computational complexity using the traditional truncated sin(x)/x interpolation algorithm grows linearly with the interpolation degree. Synthesizers presently available commonly use interpolation degrees on the order of ten, since this results in a good trade-off between complexity and sound quality.
The discussion so far has focused on problems associated with generating, from a stored set of samples, a single “voice” of sound at a desired pitch. Another aspect that contributes to computational complexity is the number of simultaneous sounds that can be generated in real-time. In a MIDI Synthesizer, this is called the number of voices. For example, in order to synthesize guitar music one needs up to six voices, since there are six strings on this instrument that can be played in various combinations.
It is desirable to be able to simultaneously reproduce a large number of voices, since more voices imply a higher degree of polyphony, and therefore also the possibility of generating more complex music. Low-end systems may require, for example, at least 24 voices, and a high performance synthesizer for musicians may require the capability of generating up to 128 simultaneous voices.
Voice generation is often implemented in a synthesizer using one or several central processing units (CPUs). The computational power of the CPU imposes a limit on the number of voices that can be executed.
In some applications, such as in a mobile communications terminal, the computational power required for maintaining a sufficient interpolation degree is lacking if, at the same time, it is desired to provide a high level of polyphony. For example, it is difficult to implement levels of polyphony as high as 40 voices or more, using an interpolation degree around ten, without the use of dedicated hardware accelerators.
Unlike the decoding of many other media content types, the computational load on the CPU varies greatly during the execution of a MIDI song. (In this description, the word “song” is used generically to refer not only to music in the traditional sense, but also to any sounds that can be encoded for automated reproduction by means of a control language such as MIDI.) This is because the complexity of a MIDI song decoding depends on such parameters as the number of active voices, the original sample rate of the table sequence and the word length of those samples.
There is therefore a need to be able to control the peaks of CPU loading so that they do not exceed the maximum allowed number of CPU cycles as measured, for example, in Millions of Instructions Per Second (MIPS). Exceeding this maximum risks a system crash.
There is also a need to be able to set the maximum allowed number of MIPS to be dedicated to song decoding so that it suits the available resources in a particular system. Such a capability would make a synthesizer implementation easily portable into a variety of systems, such as different mobile platforms with different CPU capabilities.