This invention relates to digital signal processing and more particularly to time domain digital speech processing in order to vary the rate of reproduction of speech without changing pitch.
In recent years various techniques have been developed for achieving time compression/expansion of audio information, particularly speech information. In order to utilize time compression or expansion effectively, where the compression or expansion factor is significant, some mechanism is necessary to correct for changes in pitch which would normally follow a direct application of acceleration or deceleration techniques. Acceleration or deceleration of recorded speech is easily achieved by speeding or slowing the rate of reproduction, which in turn raises or lowers pitch, as is expected.
Time compression and expansion of speech is useful in many applications. Time compression allows matching of speech information to a desired playback time. Time expansion is particularly useful for example, in dictation equipment to speed up playback or in foreign language learning situations to slow down playback to improve comprehension, which may be difficult or otherwise impaired.
Numerous techniques have been developed to achieve time compression and/or expansion, particularly techniques which manipulate analog signal representations. Of the various prior art techniques, the following patents or publications are representative:
Roucos and Wilgus, "High Quality Time-Scale Modification for Speech," ICASSP 85. Proceedings of the IEEE International Conference of Acoustics, Speech, and Signal Processing, pp. 493-6, Volume 2, 1985 (26-29 March 1985), IEEE. This relatively recent paper represents a development in the algorithms for reproducing speech using digital techniques. The research group is Bolt, Beranek & Newman Inc. of Cambridge, Mass.
Makhoul, J. and El-Jaroudi, "Time-Scale Modification in Medium to Low Rate Speech Coding," ICASSP 86. Proceedings of the IEEE International Conference of Acoustics, Speech, and Signal Processing pp. 1705-1708, Volume 3, 1986, (Apr. 7-11, 1986), IEEE. This paper produced by the same research group related to the foregoing describes further development in digital signal processing techniques for rate modifying speech.
These two papers relate to description and implementation of the synchronous-overlap-and-add method of time-scale modification. The algorithm described therein allows arbitrary linear or nonlinear scaling of the time axis using a modified overlap-and-add procedure operating on the time domain waveform. The Makhoul paper describes the implementation of a technique involving generalized cross-correlaton between a normalized source signal (y(n)) and a normalized derived signal (x(n)). The technique was originally described in the Roucos paper.
Asada et al., U.S. Pat. No. 4,435,832 issued Mar. 6, 1984, to Hitachi, describes a speech synthesizer wherein LPC (linear predictive coding) techniques are employed to synthesize speech. Control is exercised over the rate of speech by lengthening or shortening the time interval of interpolation between the fetching of each of the LPC parameters to synthesize the speech. This technology is essentially unrelated to the present invention, since the present invention is unrelated to synthesized speech or parametrically-defined speech.
Klasco et al., U.S. Pat. No. 4,406,001 issued Sept. 20, 1983, to The Variable Speech Control Company of San Francisco, describes a time compression/expansion audio reproduction system of the type which relies on analog circuitry. It provides speech correction by repetitive variable time delay achieved by separating the reproduced signal from a recording into components which are separately delayed. The signal is separated into contiguous frequency bands, each of which is delayed synchronously. The signal is then recombined after delay, and low-pass filtering techniques are employed to remove high-frequency components introduced into the speech components by the signal processing technique. This technology is readily distinguishable from the present invention for at least two reasons. First, this technology relies on analog methods, whereas the present invention is digital in nature. Second, the present invention does not require filtering of speech components. Other distinctions will also be apparent to those of ordinary skill in this art.
Brantingham et al., U.S. Pat. No. 4,209,844, issued June 24, 1980, to Texas Instruments, describes a digital filter technique using a form of linear predictive coding (LPC). Specifically, the patent describes an invention embodied in a device implementing a lattice-type filter for generating complex waveforms suitable for implementation in semiconductor device technology. The invention appears to be unsuited to time-domain speech processing and further is not applicable to time scale modification in the time domain.
Kohut et al., U.S. Pat. No. 4,022,974, issued May 10, 1987, to Bell Telephone Laboratories, describes a predictive speech synthesizer having the capability of varying speech without changing pitch. The Bell technique is substantially unrelated to the present invention, since it relates primarily to parametric speech and does not deal with a actual time domain speech signal.
What is needed is a simple yet effective digital technique for providing time scale modification of real time or near real time speech signals.