Modern communication systems make extensive use of coding to transmit speech information under circumstances of limited bandwidth. Instead of sending the input speech itself, the speech is analyzed to determine its important parameters (e.g., pitch, spectrum, energy and voicing) and these parameters transmitted. The receiver then uses these parameters to synthesize an intelligible replica of the input speech. With this procedure, intelligible speech can be transmitted even when the intervening channel bandwidth is less than would be required to transmit the speech itself. The word "vocoder" has been coined in the art to describe apparatus which performs such functions.
FIG. 1 illustrates vocoder communication system 10. Input speech 12 is provided to speech analyzer 14 wherein the important speech parameters are extracted and forwarded to coder 16 where they are quantized and combined in a form suitable for transmission to communication channel 18, e.g., a telephone or radio link. Having passed through communication channel 18, the coded speech parameters arrive at decoder 20 where they are separated and passed to speech synthesizer 22 which uses the quantized speech parameters to synthesize a replica 24 of the input speech for delivery to the listener.
Many different types of vocoders have been described in the prior art, as for example in U.S. Pat. Nos. 4,220,819, 4,330,689, 4,536,886, 4,625,286, 4,630,300, 4,677,671, 4,791,670, 4,797,925, 4,815,134, 4,817,157, 4,852,179, 4,890,327, 4,896,361, 4,899,385, 4,910,781, 4,914,699, 4,922,539, 4,933,957, 4,965789, 4,975,956 and 4,980,916 which are incorporated herein by reference.
As used in the art, "pitch" generally refers to the period or frequency of the buzzing of the vocal cords or glottis, "spectrum" generally refers to the frequency dependent properties of the vocal tract, "energy" generally refers to the magnitude or intensity or energy of the speech waveform, "voicing" refers to whether or not the vocal cords are active, and "quantizing" refers to choosing one of a finite number of discrete levels to characterize these ordinarily continuous speech parameters. The number of different quantized levels for a particular speech parameter is set by the number of bits assigned to code that speech parameter. The foregoing terms are well known in the art and commonly used in connection with vocoding.
Vocoders have been built which operate at 200, 400 600, 800, 900, 1200, 2400, 4800, 9600 bits per second and other rates, with varying results depending, among other things, on the bit rate. The narrower the transmission channel bandwidth, the smaller the allowable bit rate. The smaller the allowable bit rate the more difficult it is to find a coding scheme which provides clear, intelligible, synthesized speech. In addition, practical communication systems must take into consideration the complexity of the coding scheme, since unduly complex coding schemes cannot be executed in substantially real time or using computer processors of reasonable size, speed, complexity and cost. Processor power consumption is also an important consideration since vocoders are frequently used in hand-held and portable apparatus.
While prior art vocoders are used extensively, they suffer from a number of limitations well known in the art, especially when low bit rates are desired. Thus, there is a continuing need for improved vocoder methods and apparatus, especially for vocoders capable of providing highly intelligible speech at low or moderate bit rates.
As used herein, the word "coding" is intended to refer collectively to both coding and decoding, i.e., both creation of a set of quantized parameters describing the input speech and subsequent use of this set of quantized parameters to synthesize a replica of the input speech.
As used herein, the words "perceptual" and "perceptually" refer to how speech is perceived, i.e., recognized by a human listener. Thus, "perceptual weighting" and "perceptually weighted" refer, for example, to deliberately modifying the characteristic parameters (e.g., pitch, spectrum, energy, voicing) obtained from analysis of some input speech so as to increase the intelligibility of synthesized speech reconstructed using such (modified) parameters. Development of perceptual weighting schemes that are effective in improving the intelligibility of the synthesized speech is a subject of much long standing work in the art.