Past attempts to improve the transmission and storage of intelligence in electrical media have resulted in a number of well recognized techniques for processing human speech. One such technique is speech compression. The term "speech compression" as used herein, refers to a modulation technique based on certain properties, such as redundancy, of human speech to permit an electronic analog of a speech waveform to be transmitted over a narrower frequency band than otherwise would be necessary if the unmodulated signal were transmitted.
Bit rate and processor cost are prime concerns in the development of any compression system. Lowering the bit rate allows greater efficiencies in the storage and transmission of speech waveforms. However, bit rate reduction characteristically entails more expensive and complex processing, higher development costs and diminished voice quality. Naturalness of output refers to how human or synthetic the output utterances are subjectively judged. Some techniques, such as vocoders, are simply not capable of producing naturalness because they require vocal tract analogs to reconstitute the original speech signal.
A system found in the prior art employing speech compression is the formant vocoder. The vocoder breaks speech down into various parameters. The original incoming signal is thrown away and only the parameters are used. At the receiving or output end of such a system, a complete vocal tract analog is needed to reconstitute the original speech signal. Such systems produce synthetic sounding speech because of limitations in the various parameters and the limited number of such parameters. Additionally, processing cost is very high.
A channel vocoder splits the spectrum into frequency bands and the signal amplitude in each band is transmitted separately either via parallel transmission lines or on a single line by multiplexing techniques. At the reconstruction or receiver end, these amplitude signals control the outputs of a bank of filters. The forcing function going into the filters is a voiced or unvoiced signal derived from the original speech. Sometimes the excitation function in the original voice is actually sent as a pulse code modulation (PCM) signal. In this case, the system is called a baseband channel vocoder. Typically, these systems have the least naturalness, although they are fairly intelligible. Again, processing costs are very high. Some techniques of speech compression are good for bit reduction but offer improved memory costs only with no fidelity improvements. Also, other techniques exist which improve fidelity but are relatively uncapable of effecting any meaningful bit reduction.