The present invention relates to phase synthesis for speech processing applications.
There are many known systems for the synthesis of speech from digital data. In a conventional process, digital information representing speech is submitted to an analyzer. The analyzer extracts parameters which are used in a synthesizer to generate intelligible speech. See Portnoff, "Short-Time Fourier Analysis of Sampled Speech", IEEE TASSP, Vol. ASSP-29, No. 3, June 1981, pp. 364-373 (discusses representation of voiced speech as a sum of cosine functions); Griffin, et al., "Signal Estimation from Modified Short-Time Fourier Transform", IEEE, TASSP, Vol. ASSP-32, No. 2, April 1984, pp. 236-243 (discusses overlap-add method used for unvoiced speech synthesis); Almeida, et al., "Harmonic Coding: A Low Bit-Rate, Good-Quality Speech Coding Technique", IEEE, CH 1746, July 1982, pp. 1664-1667 (discusses representing voiced speech as a sum of harmonics); Almeida, et al., "Variable-Frequency Synthesis: An Improved Harmonic Coding Scheme", ICASSP 1984, pages 27.5.1-27.5.4 (discusses voiced speech synthesis with linear amplitude polynomial and cubic phase polynomial); Flanagan, J. L., Speech Analysis, Synthesis and Perception, Springer-Verlag, 1972, pp. 378-386 (discusses phase vocoder--frequency-based analysis/synthesis system); Quatieri, et al., "Speech Transformations Based on a Sinusoidal Representation", IEEE TAASP, Vol. ASSP34, No. 6, December 1986, pp. 1449-1986 (discusses analysis-synthesis technique based on sinusoidal representation); and Griffin, et al., "Multiband Excitation Vocoder", IEEE TASSP, Vol. 36, No. 8, August 1988, pp. 1223-1235 (discusses multiband excitation analysis-synthesis). The contents of these publications are incorporated herein by reference.
In a number of speech processing applications, it is desirable to estimate speech model parameters by analyzing the digitized speech data. The speech is then synthesized from the model parameters. As an example, in speech coding, the estimated model parameters are quantized for bit rate reduction and speech is synthesized from the quantized model parameters. Another example is speech enhancement. In this case, speech is degraded by background noise and it is desired to enhance the quality of speech by reducing background noise. One approach to solving this problem is to estimate the speech model parameters accounting for the presence of background noise and then to synthesize speech from the estimated model parameters. A third example is time-scale modification, i.e., slowing down or speeding up the apparent rate of speech. One approach to time-scale modification is to estimate speech model parameters, to modify them, and then to synthesize speech from the modified speech model parameters.
One technique for analyzing (encoding) speech is to break the speech into segments (e.g., using a Hamming window), and then to break each segment into a plurality of frequency bands. Each band is then analyzed to decide whether it is best treated as voiced (i.e., composed primarily of harmonics) or unvoiced (i.e., composed primarily of generally random noise). Voiced bands are analyzed to extract the magnitude, frequency, and phase of the harmonics in the band. The encoded frequency, magnitude, and phase are used subsequently when the speech is synthesized (decoded). A significant fraction of the available bandwidth is dedicated to representing the encoded phase.