Vocoders compress and decompress speech data. Their purpose is to reduce the number of bits required for transmission of intelligible digitized speech. Most vocoders include an encoder and a decoder. The encoder characterizes frames of input speech and produces a bitstream for transmission to the decoder. The decoder receives the bitstream and simulates speech from the characterized speech information contained in the bitstream. Simulated speech quality typically decreases as bit rates decrease because less information about the speech is transmitted.
With CELP-type ("Code Excited Linear Prediction") vocoders, the encoder estimates a speaker's speech characteristics, and calculates the approximate pitch. The vocoder also characterizes the "residual" underlying the speech by comparing the residual in the speech frame with a table containing pre-stored residual samples. An index to the closest-fitting residual sample, coefficients describing the speech characteristics, and the pitch are packed into a bitstream and sent to the decoder. The decoder extracts the index, coefficients, and pitch from the bitstream and simulates the frame of speech.
Computational methods employed by prior-art vocoders are typically user independent. These vocoders employ a generic speech characteristic model which contains entries for an extremely broad and expansive set of possible speech characteristics. Accordingly, regardless of who the speaker is, the vocoder uses the same table and executes the same algorithm. In CELP-type vocoders, generic speech characteristic models can be optimized for a particular language, but are not optimized for a particular speaker.
A need exists for a method and apparatus for low bit-rate vocoding which provides higher quality speech. Particularly needed is a user-customized voice coding method and apparatus which allows low-bit rate speech characterization based upon a dynamic underlying speech characteristic model.