1. Field of the Invention
This invention relates generally to digital communications and, in particular, to parametric speech coding and decoding methods and apparatus.
2. Description of the Background Art
For the purpose of definition, it should be noted that the term “vocoder” is frequently used to describe voice coding methods wherein voice parameters are transmitted instead of digitized waveform samples. In the production of digitized waveform samples, an incoming waveform is periodically sampled and digitized into a stream of digitized waveform data which can be converted back to an analog waveform virtually identical to the original waveform. The encoding of a voice using voice parameters provides sufficient accuracy to allow subsequent synthesis of a voice which is substantially similar to the one encoded. Note that the use of voice parameter encoding does not provide sufficient information to exactly reproduce the voice waveform, as is the case with digitized waveforms; however the voice can be encoded at a lower data rate than is required with waveform samples.
In the speech coding community, the term “coder” is often used to refer to a speech encoding and decoding system, although it also often refers to an encoder by itself. As used herein, the term encoder generally refers to the encoding operation of mapping a speech signal to a compressed data signal (the bitstream), and the term decoder generally refers to the decoding operation where the data signal is mapped into a reconstructed or synthesized speech signal.
Digital compression of speech (also called voice compression) is increasingly important for modern communication systems. The need for low bit rates in the range of 500 bps (bits per second) to 2 kbps (kilobits per second) for transmission of voice is desirable for efficient and secure voice communication over high frequency (HF) and other radio channels, for satellite voice paging systems, for multi-player Internet games, and numerous additional applications. Most compression methods (also called “coding methods”) for 2.4 kbps, or below, are based on parametric vocoders. The majority of contemporary vocoders of interest are based on variations of the classical linear predictive coding (LPC) vocoder and enhancements of that technique, or are based on sinusoidal coding methods such as harmonic coders and multiband excitation coders [1]. Recently an enhanced version of the LPC vocoder has been developed which is called MELP (Mixed Excitation Linear Prediction) [2, 5, 6]. The present invention can provide similar voice quality levels at a lower bit rate than is required in the conventional encoding methods described above.
This invention is generally described in relation to its use with MELP, since MELP coding has advantages over other frame-based coding methods. However the invention is applicable to a variety of coders, such as harmonic coders [15], or multiband excitation (MBE) type coders [14].
The MELP encoder observes the input speech and, for each 22.5 ms frame, it generates data for transmission to a decoder. This data consists of bits representing line spectral frequencies (LSFs) (which is a form of linear prediction parameter), Fourier magnitudes (sometimes called “spectral magnitudes), gains (2 per frame), pitch and voicing, and additionally contains an aperiodic flag bit, error protection bits, and a synchronization (sync) bit. FIG. 1 shows the buffer structure used in a conventional 2.4 kbps MELP encoder. The encoder employed with other harmonic or MBE coding methods generates data representing many of the same or similar parameters (typically these are LSFs, spectral magnitudes, gain, pitch, and voicing). The MELP decoder receives these parameters for each frame and synthesizes a corresponding frame of speech that approximates the original frame.
Different communication systems require speech coders with different bit-rates. For example, a high frequency (HF) radio channel may have severely limited capacity and require extensive error correction and a bit rate of 1.2 kbps may be most suitable for representing the speech parameters, whereas a secure voice telephone communication system often requires a bit rate of 2.4 kbps. In some applications it is necessary to interconnect different communication systems so that a voice signal originally encoded for one system at one bit rate is subsequently converted into an encoded voice signal at the other bit rate for another system. This conversion is referred to as “transcoding”, and it can be performed by a “transcoder” typically located at a gateway between two communication systems.