In one embodiment, the present invention relates to methods and apparatus for modifying a digitized acoustic signal by means of systematic manipulation of the signal's discrete short-time Fourier transform.
It is well established that a discrete signal x(n) can be perfectly reconstructed from a sequence X (k,m) of its windowed Discrete Fourier Transforms (DFTs) by applying an inverse Discrete Fourier Transform to each DFT and then properly weighting and overlap-adding the sequence of inverse DFTs ##EQU1## and L is the spacing between successive DFTs. It is also well known that modified versions of x(n) can be obtained by applying the above reconstruction formula to a sequence of modified DFTs.
In general, the DFT values are complex. Many useful modifications of the DFT values affect only their "magnitudes" (e.g., noise reduction, spectral-envelope modification, etc.). However, there are applications for which the phases of the DFT values must be modified (either instead of or in addition to the magnitude values).
The best known of these is frequency-domain time-scaling, in which the signal is stretched or shrunken in time while still preserving its original pitch. Since the underlying goal is to change the rate at which the signal's spectrum evolves in time, it is reasonable to accomplish this by taking a sequence of overlapping windowed DFTs and spacing them closer together (or further apart) during analysis than during synthesis.
A problem arises, however, in that the DFT phases must be modified in order to force the modified DFTs to overlap-add coherently upon resynthesis. This problem was first addressed by Portnoff, who suggested that the phase, .phi.(k,m) of the DFT value at frequency k for the m'th DFT be modified according to: EQU .phi.(k,m)=.phi.(k,m-1)+.alpha.[.phi.(k,m)-.phi.(k,m-1)]
where .varies. is the time-scale factor. See, M. R. Portnoff, "Time-Scale Modification of Speech Based on Short-Time Fourier Analysis," IEEE Trans. Acoustics, Speech, and Signal Proc., pp. 374-390, vol. ASSP-29, No. 3 (1981), the contents of which are herein incorporated by reference for all purposes. This method produces good-sounding results when applied to speech or music, but it often introduces undesirable timbral alterations as well. To achieve the good-sounding results, the Portnoff technique requires that the synthesis transforms be overlapped so that L is no greater than 25% of N.
The reason for the timbral alterations is that Portnoff's algorithm accumulates phase for the DFT value at frequency k without regard for the phases of DFT values at frequency k-1 or k+1. Since phase accumulates independently in each frequency channel from the beginning of time, the phase relationships "within" each successive DFT gradually cease to be preserved in the modified DFTs.
Several solutions to this problem have been suggested in the literature. Sylvestre and Kabal proposed a scheme in which the signal is first partitioned into a set of contiguous signal-segments; Portnoff-style time-scaling is then applied to each signal-segment, with provisions for making the modified segments phase-continuous. See B. Sylvestre, et al., "Time-Scale Modification of Speech Using an Incremental Time-Frequency Approach with Waveform Structure Compensation," IEEE Int'l Conf. on Acoustics, Speech, and Signal Proc., pp. 81-84 (1992), the contents of which are herein incorporated by reference. This approach basically decreases the deleterious effects of the independently accumulated phases in each frequency channel by restricting the accumulation to a relatively short duration. The phase adjustment between successive signal-segments is addressed separately.
Puckette suggested that an effective "phase locking" of adjacent frequency channels could be obtained by modifying the Portnoff-style accumulated phase in each channel to bias it toward maintaining the original (unmodified) phase relationship across channels. His algorithm effectively replaces the default accumulated phase at frequency k for the m'th DFT frame that would have been provided by the Portnoff technique with a weighted average of the accumulated frequencies k-1, k, and k+1 for the m'th DFT frame.
Thus, while Sylvestre and Kabal segment the signal in time, Puckette simply averages DFT values across neighboring frequencies. Neither of these two solutions dramatically improve the resulting sound. The two solutions also do not offer greater computational efficiency.
Various other proposed solutions to the phase-modification problem present more radical departures from Portnoff's original framework, computing new phases, based either on iterative analysis-synthesis algorithms or on fitting each DFT to an explicit sinusoidal model. They make different fundamental assumptions and demand significantly more computation.
Thus, known approaches to frequency-domain time-scaling confront the phase-modification problem in one of two ways: Either they (1) preserve the underlying DFT analysis-synthesis structure of Portnoff and introduce simple time-domain segmentation or frequency-domain averaging to minimize the decorrelation of phase between original DFTs and modified DFTs, or they (2) abandon the Portnoff framework and compute new phases based either on iterative analysis-synthesis algorithms or on fitting each DFT to an explicit sinusoidal model.
There exists a need for computationally efficient approaches to modifying DFT phase values both in time-scaling and in frequency-warping applications. In particular, a DFT analysis-synthesis system capable of modifying the DFT phase values to either improve fidelity or decrease computational requirements would be highly useful.