Conventional analog telephone systems are being replaced by digital systems. In digital systems, the analog signals are sampled at a rate of greater than or equal to about twice the bandwidth of the analog signals or about eight kilohertz, and the samples are then encoded. In a simple pulse code modulation system (PCM). each sample is quantized as one of a discrete set of prechosen values and encoded as a digital word which is then transmitted over the telephone lines. With eight bit digital words, for example, the analog sample is quantized to 2.sup.8 or 256 levels, each of which is designated by a different eight bit word. Using nonlinear quantization, excellent quality speech can be obtained.
Efforts have been made to reduce the bit rates required to encode the speech and obtain a clear decoded speech signal at the receiving end of the system. The linear predictive coding (LPC) technique is based on the recognition that speech production involves excitation and a filtering process. The excitation is determined by the vocal cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating signal is then modified by the filtering process of vocal resonance chambers including the mouth and nasal passages. For a particular group of samples, a digital linear filter which simulates the formant effects of the resonance chambers can be defined and the definition can be encoded. A residual signal which approximates the excitation can then be obtained by passing the speech signal through an inverse formant filter, and the residual signal can be encoded. Because sufficient information is contained in the lower-frequency portion of the residual spectrum, it is possible to encode only the low frequency baseband and still obtain reasonably clear speech. At the receiver, a definition of the formant filter and the residual baseband are decoded. The baseband is repeated to complete the spectrum of the residual signal. By applying the decoded filter to the repeated baseband signal, an approximation to the initial speech can be reconstructed.
A major problem of the LPC approach is in defining the formant filter which must be redefined with each window of samples. A problem with such systems is that they do not always provide a satisfactory reconstruction of certain formants such as that resulting from nasal resonance. As a result, the quality of reconstruction from 16,000 bits per second is generally unsatisfactory.
Another speech coding scheme which exploits the concepts of excitation-filter separation and excitation baseband transmission is described by Zibman in U.S. Pat. No. 4,914,701. In that approach, speech is encoded by first performing a Fourier transform of a window of speech. The Fourier transform coefficients are normalized by first defining a piecewise constant approximation of the spectral envelope and then scaling the frequency coefficients relative to the approximation. The normalization is accomplished first for each formant region and then repeated for smaller subbands. Quantization and transmission of the spectral envelope approximations amount to transmission of a filter definition. Quantization and transmission of the scaled frequency coefficients associated with either the lower or upper half of the spectrum amounts to transmission of a "baseband" excitation signal. At the reciever, the full spectrum of the excitation signal is obtained by adding the transmitted baseband to a freqency translated version of itself. Frequency translation is performed easily by duplicating the scaled Fourier coefficients of the baseband into the corresponding higher or lower frequency positions. A signal can then be fully recreated by inverse scaling with the tranmitted piecewise-constant approximations. This coding approach can be very simply implemented and provides good quality speech at 16 kilobits per second. However, it performs poorly with non-speech voice-band data transmission.
A modification of the Zibman coding technique is presented by Mazor et al. in U.S. Pat. No. 4,790,016. In that approach, the transform spectrum is divided into a plurality of subbands of coefficients. The approximate envelope is defined for each subband and each envelope definition is encoded for transmission. As in the Zibman approach, each spectrum coefficient is scaled relative to the defined envelope of the respective subband. In the Mazor et al. improvement, the number of bits to which each coefficient is encoded is determined by the defined envelope of its subband. Specifically, the four subbands having the largest initial peak energy, and thus the largest envelope definition, are quantized to seven bits for each coefficient. The four subbands having the next smaller envelope definitions are quantized to six bits per coefficient, and the four next smaller subbands are quantized to four bits per coefficient The coefficients of the remaining subbands are not transmitted; that is, they were quantized to zero bits per coefficient At the receiver, the transmitted subbands are replicated to define coefficients of frequencies which are not transmitted.