Conventional analog telephone systems are being replaced by digital systems. In digital systems, the analog signals are sampled at a rate of about twice the bandwidth of the analog signals or about eight kilohertz, and the samples are then encoded. In a simple pulse code modulation system (PCM), each sample is quantized as one of a discrete set of prechosen values and encoded as a digital word which is then transmitted over the telephone lines. With eight bit digital words, for example, the analog sample is quantized to 2.sup.8 or 256 levels, each of which is designated by a different eight bit word. Using nonlinear quantization, excellent quality speech can be obtained with only seven bits per sample; but since a seven bit word is still required for each sample, transmission bit rates of 56 kilobits per second are necessary.
Efforts have been made to reduce the bit rates required to encode the speech and obtain a clear decoded speech signal at the receiving end of the system. The linear predictive coding (LPC) technique is based on the recognition that speech production involves excitation and a filtering process. The excitation is determined by the vocal cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating signal is then modified by the filtering process of vocal resonance chambers, including the mouth and nasal passages. For a particular group of samples, a digital filter which simulates the formant effects of the resonance chambers can be defined and the definition can be encoded. A residual signal which approximates the excitation can then be obtained by passing the speech signal through an inverse formant filter, and the residual signal can be encoded. Because sufficient information is contained in the lower-frequency portion of the residual spectrum, it is possible to encode only the low frequency baseband and still obtain reasonably clear speech. At the receiver, a definition of the formant filter and the residual baseband are decoded. The baseband is repeated to complete the spectrum of the residual signal. By applying the decoded filter to the repeated baseband signal, the initial speech can be reconstructed.
A major problem of the LPC approach is in defining the formant filter which must be redefined with each window of samples. A complex encoder and a complex decoder are required to obtain transmission rates as low as 16,000 bits per second. Another problem with such systems is that they do not always provide a satisfactory reconstruction of certain formants such as that resulting, for example, from nasal resonance.
Another speech coding scheme which exploits the concepts of excitation-filter separation and excitation baseband transmission is described by Zibman in U.S. patent application Ser. No. 684,382, filed Dec. 20, 1984. In that approach, speech is encoded by first performing a Fourier transform of a window of speech. The Fourier transform coefficients are normalized by making a piecewise-constant approximation of the spectral envelope and scaling the frequency coefficients relative to the approximation. The normalization is accomplished first for each formant region and then repeated for smaller subbands. Quantization and transmission of the spectral envelope approximations amount to transmission of a filter definition. Quantization and transmission of the scaled frequency coefficients associated with either the lower or upper half of the spectrum amounts to transmission of a "baseband" excitation signal. At the receiver, the full spectrum of the excitation signal is obtained by adding the transmitted baseband to a frequency translated version of itself. Frequency translation is performed easily by duplicating the scaled Fourier coefficients of the baseband into the corresponding higher or lower frequency positions. A signal can then be fully recreated by inverse scaling with the transmitted piecewise-constant approximations. This coding approach can be very simply implemented and provides good quality speech at 16 kilobits per second. However, it performs poorly with non-speech voice-band data transmission.