This invention relates to apparatus for digitizing analog speech and more particularly to apparatus for providing compressed speech to allow transmission of such compressed speech over conventional communication channels.
Presently, many modern switching systems employ digital data which is transmitted from a first location to a second location through a digital switching system. In such systems, digital signals are employed throughout the system in order to increase system reliability and to further alleviate many of the problems involved with the transmission of analog data. In this manner conventional analog signals are converted to digital signals such as pulse code modulated signals and are transmitted through the switching network over existing communications channels.
As one can ascertain, such switching networks accommodate various transmission capabilities. In this manner, the number of bits as well as the bit rate of the signal varies according to the particular modems employed and in regard to the capacity of the transmission lines associated with such a system. A basic problem which has existed with regard to the digitization and transmission of analog speech involves the fact that the analog speech typically resides in a frequency range from zero to around 3 KHZ. In regard to digitizing such speech one must use a rate which is high enough to satisfy the Nyquist criterion of sampling and hence employ a frequency of twice the bandwidth. That would result in a sampling rate of approximately 8 KHZ.
Assuming that 10 bits would be sufficient to describe the amplitude of the speech wave for each sample, the required bit transmission rate would be 80 kilobits per second. This for example is not capable of being handled by conventional telephone lines. The prior art is cognizant of such problems and employed a technique designated as linear predictive coding (LPC). Linear predictive coding (LPC) uses a parametric model of the human vocal system to encode speech. This model describes speech production as being controlled by three factors. A first factor is the excitation source which is the energy or gain of a signal and the shape of the acoustic cavity from the epiglottis to the lips. Speech signals can either be voiced such as the A in Ape or unvoiced as the S in Sister.
In any event, the excitation mechanism for the voice signal is modeled by a series of pulses separated by a fixed pitch. The excitation source for the unvoiced signal is modeled as a noise generator. The shape of the acoustic cavity is represented by a plurality of resonant circuits tuned to give information regarding the natural frequencies of the analog speech. The linear predictive coding technique takes advantage of the fact that many speech parameters will not change for a considerable number of samples during a typical speech pattern. Thus, linear predictive coding models typically use an analysis frame containing many samples to arrive at a composite profile for the speech frame before transmitting information on the channel. A commonly used analysis frame duration is 180 samples.
Thus, the channel bit transmission rate can be of the order of a few kilobits per second, a number which such channels as ordinary telephone lines is capable of transmitting. The linear predictive coding technique has been discussed in many technical papers. For example, see an article of A. Buzo et al, entitled "Speech Coding Based on Vector Quantization", I.E.E.E. Transactions on ASSP, Oct. 1980. See also an article by B. S. Atal and J.M. Remde entitled "A New Model of LPC Excitation. . .", Proceedings 1982 ICASSP., pages 614-617. See also an article by Parker et al entitled "Low Bit Rate Speech Enhancement. . .",Proceedings 1984 ICASSP, pages 1.5.1-1.5.4.
As one can ascertain from the prior art, there are problems in transmitting digitized speech over transmission lines or telephone lines. There is a desire to transmit digitized speech of high quality at required bit rates or at multiple rates according to the qualities and characteristics of the switching system or the transmission medium. In providing multiple rate capability, one must assure that the speech processing in regard to quality is suitable for purposes of reconverting the digitized speech back into analog signals without losing excessive information content.
The prior art was cognizant of providing apparatus wherein analog speech was digitized and transmitted over a channel at a minimum bit rate and yet allowing such speech to be synthesized at the receiver end with high intelligibility and quality. In any event, as indicated above, based on modern communication systems, such as digital switching systems employing digital transmissions, one must provide the digitization of analog speech in a digital format which format is capable of providing high speech quality with the required bit rate and having the further capability of varying the rate to accommodate different modems or different transmission requirements For examples of certain prior art techniques, reference is made to a patent application entitled DIGITAL SPEECH CODING CIRCUIT filed on Dec. 24, 1985 for J. Bertrand as Serial No. 813,110 and assigned to the assignee herein, now U.S. Pat. No. 4,720,861, issued Jan. 19, 1988.
This application relates to a digital speech coding apparatus circuit which makes use of linear predictive coding, vector quantization, Huffman coding, and excitation estimation to produce digital representations of human speech having bit rates low enough to be transmitted over telephone lines and at the same time capable of being synthesized in the receiver portion of the circuit to produce analog speech of high intelligibility and quality.
The transmitter portion of the circuit comprises a series connection of a lowpass filter, analog-to-digital converter, a linear predictive coding module comprising five resonators for establishing five center frequencies and bandwidths of the analog speech, a vector quantization module for providing a binary representation of the likely combinations of resonance found in human speech, a Huffman coding module, a variable bit rate to fixed bit rate converter and optionally an encryption module. Another branch of the transmitter circuit extends from the output of the analog to digital converter to the bit rate converter and comprises a series combination of an inverse filter and an excitation estimation module having parallel outputs respectively representative of a voiced/unvoiced signal, the excitation amplitude, and the excitation pulse position. The receiver portion of the circuit comprises a series connection of a fixed bit rate to variable rate converter, a bit unmapping module which produces separate outputs representative of the reflection coefficients and excitation of the speech. The synthesis filter which receives these outputs produces a digital signal representative of the analog speech and converts the signal to audio by a digital to analog converter and a lowpass filter.
As indicated, the prior art is cognizant of the necessity of providing digital speech coders and reference is also made to U.S. Pat. No. 4,472,832 issued on Sept. 18, 1984 to B. S. Atal et al and entitled DIGITAL SPEECH CODER. In that patent there is shown a speech analysis and synthesis system where an LPC parameter and a modified residual signal for excitation is transmitted. The excitation signal is the crosscorrelation of the residual signal and the LPC recreated original signal. Essentially, the patent recognizes the act that digital speech communication systems including voice storage and voice response facilities may utilize signal compression to produce the bit rate needed for storage and/or transmission.
The patent then describes a sequential pattern processing arrangement which sequential pattern is partitioned into successive time intervals In each time interval a set of signals representative of the interval sequential pattern and a signal representative of the differences between the interval sequential pattern and the interval representative signal are generated.
The speech pattern is partitioned in successive time intervals. In each interval a set of signals representative of the speech pattern and a signal representative of the differences between the interval speech pattern are generated.
In this manner one can obtain a compression of speech after the speech has been digitized. Thus, as indicated, the prior art has been concerned with the problem and concerned with devices which enable one to compress speech to allow transmission without sacrificing speech quality. See also an article entitled "Improved Pulse Search Algorithms For Multi-Pulse Excited Speech Coder" by S. Ono, T. Araseki, and K. Ozawa of the NEC Corporation of Japan, published 1984 at the Globe Com Conference in Atlanta, Ga.
It is an object of the present invention to provide a multi-rate digital voice coder which voice coder allows one to compress speech to allow digital speech to be transmitted over conventional communications channels such as telephone links.
It is a further object of the present invention to provide a multi-rate digital voice coder apparatus which enables one to preserve high speech quality after digitization which digitized signal is capable of being transmitted at different rates for accommodating different transmission channels.
It is a further object of the present invention to provide a multi-rate digital voice coder apparatus which enables one to provide compressed speech for more efficient digital transmission and storage.