Communication of speech information often involves transmitting electrical signals which represent speech over a channel or network ("channel"). A problem commonly encountered in speech communication is how to transmit speech through a channel of limited capacity or bandwidth (in modern digital communications systems, bandwidth is often expressed in terms of bit-rate). The problem of limited channel bandwidth is usually addressed by the application of a speech coding system, which compresses a speech signal to meet channel bandwidth requirements. Speech coding systems include an encoder, which converts speech signals into code words for transmission over a channel, and a decoder, which reconstructs speech from received code words.
As a general matter, a goal of most speech coding systems concomitant with that of signal compression is the faithful reproduction of original speech sounds, such as, e.g., voiced speech. Voiced speech is produced when a speaker's vocal cords are tensed and vibrating quasi-periodically. In the time domain, a voiced speech signal appears as a succession of similar but slowly evolving waveforms referred to as pitch-cycles. Each pitch-cycle has a duration referred to as a pitch-period. Like the pitch-cycle waveform itself, the pitch-period generally varies slowly from one pitch-cycle to the next.
Many speech coding systems which operate at bit-rates around 8 kilobits per second (kb/s) code original speech waveforms by exploiting knowledge of the speech generation process. Illustrative of these so-called waveform coders are the code-excited linear prediction (CELP) speech coding systems.
A CELP system codes a speech waveform by filtering it with a time-varying linear prediction (LP) filter to produce a residual speech signal. During voiced speech, the residual signal comprises a series of pitch-cycles, each of which includes a major transient referred to as a pitch-pulse and a series of lower amplitude vibrations surrounding it. The residual signal is represented by the CELP system as a concatenation of scaled fixed-length vectors from a codebook. To achieve a high coding efficiency of voiced speech, most implementations of CELP also include a long-term predictor (or adaptive codebook) to facilitate reconstruction of a communicated signal with appropriate periodicity. Despite improvements over time, many waveform coding systems operating at rates below 6 kb/s suffer from perceptually significant distortion, typically characterized as noise.
Coding systems which operate at rates of 2.4 kb/s are generally parametric in nature. That is, they operate by transmitting parameters describing pitch-period and the spectral envelope (or formants) of the speech signal at regular intervals. Illustrative of these so-called parametric coders is the LP vocoder system.
LP vocoders model a voiced speech signal with a single pulse per pitch period. This basic technique may be augmented to include transmission information about the spectral envelope, among other things. Although LP vocoders provide reasonable performance generally, they may introduce perceptually significant distortion, typically characterized as buzziness.
The types of distortion discussed above, and another--reverberation--common in sinusoidal coding systems, are generally the result of reconstructed speech which lacks (in whole or in significant part) the pitch-cycle dynamics found in original voiced speech. Naturally, these types of distortion are more pronounced at lower bit-rates, as the ability of speech coding systems to code information about speech dynamics decreases.