1. Field of the Invention
This invention relates to digital signal processing and more specifically to electronic sound synthesizing by use of wavefunctions.
2. Related Art
Digital resampling sound synthesizers, also commonly known as "wavetable" synthesizers, have become widespread in consumer sound synthesizer applications, finding their way into video games, home computers, and karaoke machines, as well as in electronic performance musical instruments. They are generally known for their reproduction of realistic musical sounds, a consequence of the fact that the sounds are generated using digitally sampled Pulse Code Modulated (PCM) recordings of the actual musical instruments. The sound reproduction quality varies tremendously, however, depending on tradeoffs of sample storage space, computational cost, and quality of the analog signal circuitry.
The principle of operation is quite simple: sounds are digitally sampled and stored in some memory, such as ROM (read-only memory) for turn-key applications, and RAM (random-access memory, also known as read/write memory) for programmable configurations. RAM-based systems usually download the samples from a high-capacity storage device, such as a hard disk. To conserve memory, not every note of a given instrument is actually sampled in a practical sampling synthesizer. A complete recording of a musical instrument across all keys and velocities can easily consume several hundred megabytes of storage. Instead, notes are sampled at regular intervals from the full range of the instrument. The missing notes are reconstructed by contracting or expanding the actual samples in time, in order to raise or lower the pitch of the original recordings, respectively. It is well known that playing back a recording slower than its original sampling rate lowers the pitch, and conversely playing a recording back at a faster rate increases its pitch. Instead of actually playing back a raw sound recording at varying sample rates through a digital-to-analog converter (DAC) to shift the pitch, what is typically done in modern resampling synthesis is to stretch the stored recording of the note to a new sample rate (relative to the original PCM recording) and play out the new samples at a predetermined output rate. One major benefit is that several pitch-shifted notes may be played back simultaneously by resampling with different ratios but mixed together into a common PCM stream, which is sent to a single fixed-sample rate DAC. This method reduces hardware (circuitry) because it does not require a separate DAC for each individual note, making the incremental cost of the analog hardware to support polyphony essentially "free".
In order to effect such a resampling in the digital domain it is necessary to use interpolation techniques to resample the recording to the desired playback speed. There are several well-known techniques for resampling digital audio recordings. A technique that is used frequently for resampling is based on polyphase filtering (See Multirate Digital Signal Processing, R. E. Crochiere et al., Prentice Hall, 1983 and Multirate Systems and Filter Bank, p.p. Vaidyanathan, Prentice-Hall, 1993). One limitation of this technique is that the complexity of resampling calculations increases rapidly if the resampling ratio is not the ratio of small integers. For example, two popular sampling ratios used in digital audio are 44.1 KHz and 48 KHz: their ratio is 147/160. To convert from 44.1 to 48 KHz would require a polyphase filter with 160 phases. requiring large tables. Furthermore, since conversions are limited to rational resampling ratios, polyphase resampling is ill-suited to resampling synthesis, which requires a continuum of ratios for pitch-bending.
A way of overcoming the limitations of polyphase resampling is to use interpolated polyphase resampling, which can be used to obtain arbitrary-ratio sample rate conversions. (See e.g. "A Flexible Sampling--Rate Conversion Method", J. O. Smith and P. Gosset, Proc. ICASSP, p.p. 19.41-19.4.4, 1984; "Theory and VLSI Architectures for Asynchronous Sample--Rate Converter", R. Adams and T. Kwan, J. Audio and Engineering Society, Vol. 41, July 1, August 1993; and "A Stereo Asynchronous Sample-Rate Converter for Digital Audio", R. Adams and T. Kwan, Symposium on VLSI Circuits, Digest of Technical Papers, IEEE Cat. No. 93CH 330J-3, p.p. 39-40, 1993.) For a given re-sampling phase the closest two polyphase filters are chosen and linearly interpolated between using the fractional phase offset. Use of this technique is widespread in resampling for musical and other digital audio applications.
To perform accurate interpolated polyphase sample rate conversion there are two goals. One is that the model filter for the polyphase filterbank should be as close as possible to a ##EQU1## function, which is well-known to have a perfect "brick-wall" (vertical) transfer function, shown in FIG. 1. The length of the model filter determines the number of taps in the resulting FIR filter generated by the phase interpolation process. This ideal is unattainable since the sinc function has infinite extent in time. Typically, the model filter is a windowed sinc to keep the number of taps small--usually between 4 and 64, with obviously increasing deviations from the ideal as the number decreases. The other ideal is that the number of phases should be as large as possible so that interpolating between adjacent phases incurs as little error as possible. It is known that if N bits of accuracy in the FIR coefficient calculation are desired then the polyphase filterbank should have .sqroot.2.sup.N phases.
Typical resampling synthesizer implementations use a small number of interpolated FIR taps to save computational cost. Lower-quality resampling synthesizers go so far as to use linear interpolation (two-point interpolation), which can result in significant aliasing and imaging artifacts due to the slow rolloff of attenuation in the stop band. The effective model filter resulting from linear interpolation has a transfer function shown in FIG. 2. It is known to use 7- or 8-tap interpolating filters calculated using a 16-phase interpolated polyphase filter, (See "Digital Sampling Instrument For Digital Audio Data", D. Rossum, U.S. Pat. No. 5,111,72.) FIG. 3 shows the transfer function of the model filter with various cutoffs. Such an interpolating filter, though far from ideal, is considered to give acceptable-quality interpolation. In addition to problems in the stopband, low-order interpolating filters suffer from undesirable rolloff in the passband due to the wide transition band, as can be seen in FIGS. 2 and 3. This problem results in significant attenuation of signal energy, becoming most severe near the Nyquist frequency, potentially causing resampled musical note recordings to sound dull. This is compensated for in many resampling synthesizers by recording note samples at a higher-than-critical sampling rate, attempting to provide enough margin above the highest significant musical frequency components so that the undesired attenuation happens mostly where it is unimportant. A disadvantage of this strategy is that it requires more storage space to compensate for expanded data sets.
Computational Cost
Discrete-time, sampled representations are a highly useful representation of analog data with well-developed means of analysis and manipulation. Re-sampling a discrete-time signal conceptually converts a sample stream into an analog signal by convolving with a sinc function, followed by sampling at the new desired rate. Of course, a practical resampler does not actually perform the conversion to an analog signal--that would require an infinite amount of storage and computation. Rather, only the output samples that are actually desired are computed. Even so, as mentioned above, a major problem with discrete-time resampling is that the reconstruction process is non-localized due to the infinite extent of the kernel; that is to say, to calculate an arbitrary point x(t.sub.0) from the perfect reconstruction of a critically sampled waveform x[n], as guaranteed by the Nyquist theorem (see "Certain Topics in Telegraph Transmissions Theory", Nyquist, AIEE Trans, pp. 617-644, 1928), the entire sampled stream must be used, as seen in the sum ##EQU2## Even in the non-ideal case where the sinc(t) function is replaced by a model reconstruction filter h(t) of finite duration ##EQU3## the reconstruction is still as non-localized as the support of h(t), which must be broad if high-quality resampling is desired.
If one wants a high-quality resampler, one must use many interpolation points, each of which requires one multiply-accumulate. In addition, the resampler must provide the coefficients, thus incurring more computations. The above-described method using a linearly interpolated sinc function requires one multiply and two adds per coefficient. For 8-point interpolation, we see that 16 multiplies and 24 adds are required per output sample. This expense is largely due to the non-locality of the PCM representation conventionally used in digital audio. What is desired is to find a localized, yet accurate representation of a continuous-time function: this is provided by the presently disclosed wavefunction process and apparatus.
Coefficient Tables
In addition to the computational cost associated with calculating interpolated filter coefficients, there is an "architectural" (circuitry) burden associated with using the large polyphase tables required for high-quality resampling. A large table is disadvantageous because it must be accessed twice per interpolated coefficient for each coefficient used for each sample: an N-point resampler must access the table 2N times per output sample produced. A fast-access memory is therefore required to store it. Special-purpose music synthesis and resampling chips have fast ROMs with special pipelined circuitry to provide the table values. The ROM access circuits usually take advantage of symmetry by folding the table in half and mirroring the access. For programmable circuits, such as DSPs (digital signal processors) or microprocessors, the large table must be held in a low-latency SRAM or level-1 cache. Usually, such resources are limited, restricting the size of the table.
To summarize, in designing a traditional resampling synthesizer, significant tradeoffs must be made between quality (frequency response and artifact suppression), and computational budget. Practical implementations using traditional techniques are generally computationally bound and thus must make do with lower-than-ideal quality, with skillful voicing necessary to avoid artifacts.