In a number of important applications it is desirable to carry out spectral transformations on acoustical signals. In speech signal processing, the speech may be compressed or expanded in frequency. In particular, frequency compression is useful in bandwidth reduction or in placing the speech into a desired frequency range as an aid to the hearing impaired. Another speech application requires that the fundamental frequency of the speaker be modified while preserving the shape of the envelope of the short-time speech spectrum. This operation is useful in psychoacoustic research and in correcting pitch discontinuities in concatenated speech segments. In musical signal processing, in order to synthesize all individual notes across the entire range of a particular musical instrument, a common practice is to analyze some of the original notes and store their parameters. At the synthesis stage, all other notes are obtained from the analyzed notes by pitch shifting. Generally speaking, in a sampler or a wavetable synthesizer, one original sound waveform is stored for every three or four notes. The pitch shifting is accomplished by sample rate conversion. It is well known that the pitch shifting through sample rate conversion preserves the original signal waveform, but creates two undesired effects. One is that it "compresses" the signal spectrum so that the pitch-shifted signal sounds "darker". To avoid aliasing, the pitch is always shifted down in samplers or wavetable synthesizers. The other one is that since the signal waveform shape is not changed among adjacent notes, musical sounds synthesized by a sampler or a wavetable synthesizer lack variations from note to note, and thus lack the realism of musical instruments. To improve the brightness and the realism of pitch-shifted signals, researchers are trying to use the result from speech signal analysis and synthesis, that is, trying to preserve the signal spectrum envelope when the original signal is pitch-shifted. Even though the physical reason of such use remains to be justified, it is widely accepted that the brightness of pitch-shifted signals does get improved by preserving the shape of the signal spectrum envelope.
A prior art frequency-domain approach is described by Quatieri, et al. in an article entitled, "Speech Transformations based on a Sinusoidal Representation," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. 34, pp. 1449-1464, December 1989. Assume s(t) is the signal to be pitch-shifted by a factor .beta.. According to Quatieri, et al., the pitch shifting or frequency transformation is performed as follows. First, a transfer function EQU H(.omega., t)=M(.omega., t) exp [j.PHI.(.omega., t)]
is obtained. (In practice, only uniform samples of H(.omega., t) from the Discrete Fourier Transform (DFT) are available and stored. The magnitude response of this transfer function, H(.omega., t), is a good approximation to the spectrum envelope of the signal s(t). The phase function, .PHI.(.omega., t), is the Hilbert transform of M(.omega., t). So the transfer function H(.omega., t) represents a minimum phase system. The socalled excitation signal e(t) can then be obtained by filtering s(t) through the inverse system of H(.omega., t). The excitation signal e(t) can be expressed using a sinusoidal model as ##EQU2##
When a pitch modification is needed, each sine-wave component of the excitation signal is scaled by a desired factor .beta. to generate a new frequency track at .beta..omega..sub.l (t). The excitation amplitude a.sub.l (t) is then shifted to the new frequency track location. To preserve the shape of the spectrum envelope, the amplitudes and phases of H(.omega., t) must be computed at the new frequency track location .beta..omega..sub.l (t). They are obtained by sampling (interpolation in frequency) M(.omega., t) and .PHI.(.omega., t), respectively.
With the above modified excitation and system magnitudes and phases, the resulting modified signal waveform, denoted as s(t, .beta.), is given by ##EQU3##
It is not difficult to see that this frequency domain approach requires a large amount of memory (to store the samples of M(.omega., t) and .PHI.(.omega., t), and computations (to obtain the system magnitudes and phases at new frequency track location.)