Speech coding systems function to provide codeword representations of speech signals for communication over a channel or network to one or more system receivers. Each system receiver reconstructs speech signals from received codewords. The amount of codeword information communicated by a system in a given time period defines the system bandwidth and affects the quality of the speech received by system receivers.
The objective for speech coding systems is to provide the best wade-off between speech quality and bandwidth, given side conditions such as the input signal quality, channel quality, bandwidth limitations, and cost. The speech signal is represented by a set of parameters which are quantized for transmission. Perhaps most important in the design of a speech coder is the search for a good set of parameters (including vectors) to describe the speech signal. A good set of parameters requires a low system bandwidth for the reconstruction of a perceptually accurate speech signal. The bandwidth required for each parameter is a function of the rate at which it changes, as well as the accuracy it needs for high quality reconstructed speech.
The human auditory system is very sensitive to the level of periodicity of the reconstructed signal. The level of periodicity is a function of both time and frequency. Speech varies in the level of periodicity. Voiced speech is characterized by a high level of periodicity, and unvoiced speech has a low level of periodicity. Coders operating at lower bit rates generally do not reconstruct the level of periodicity in a perceptually transparent fashion.
From information-theoretic arguments, it can be shown that the signal bandwidth required to transmit the waveform of a noisy signal exactly is very high. However, for perceptually accurate signal reconstruction, only certain statistical quantities of the noise component of a signal require transmission (mainly a rough description of its magnitude spectrum). This makes the separation of the periodic and noisy components of the original signal unavoidable for efficient coding at low bit rates.
The first-generation linear-prediction based vocoders generally used a simple 2-state periodicity description (periodic or nonperiodic), uniform over the entire signal frequency band and updated about once every 25 ms. See, e.g., Tremain, "The Government Standard Linear Predictive Coding Algorithm", Speech Technology, pp. 40-49 (April 1982). Some of the more recent coders use a frequency-dependent periodicity level (usually with 2 levels per band). Others use multiple coding modes, each of which can generally be associated with a particular mean level of periodicity. In general, it is difficult to assess the level of periodicity reliably with existing methods. In addition, the time-resolution of the periodicity level is low.
In recent years, it has been shown that the prototype-waveform interpolation (PWI) method provides an efficient method for the coding of voiced speech. The basic concept of PWI is to extract a representative pitch cycle (the prototype waveform) at fixed intervals, to transmit its description, and to reconstruct the speech signal by interpolating between the prototype waveforms. In most implementations the PWI method operates on the linear-prediction residual signal, and the prototype waveforms are described with a Fourier-series. W. B. Kleijn, "Encoding Speech Using Prototype Waveforms," IEEE Trans. Speech and Audio Processing, Vol. 1, No. 4, p. 386-399 (1993).
In existing implementations of the PWI coding method, the nonperiodic signal is coded by another method of speech coding, usually CELP. The switching between coders is inherently unrobust. Usually, the CELP has no pitch predictor because of the low bit rates at which the system is operating. Thus, the level of periodicity can vary only within a small range in both the PWI and CELP modes. The performance of the PWI coding can be improved upon by adding spectrally-shaped noise to the PWI-synthesized signal, or by increasing the update rate of the prototype waveforms (increasing the signal bandwidth). In practice, existing implementations of the PWI coding method suffer from artifacts introduced by incorrect representation of the periodicity levels.