In modern communication systems data transmission occurs under a condition of narrow bandwidth. It is therefore highly advantageous to develop methods which allow for data transmission at low bit rates. Various data compression techniques have been developed to assist in accomplishing the goal of low bit rate data transfer. In lieu of sending the input speech itself, the speech is analyzed to determine its parameters such as, pitch, spectrum, energy and voicing and these parameters are transmitted. The receiver then uses these parameters to synthesize an intelligible replica of the input speech.
Adding to the difficulty in low bit rate data transmission is the requirement that the data to be transmitted utilizing a channel encoding process that protects the data from irregularities in the transmission channel. This significantly increases the bit rate required for the data transmission. As will be discussed below, the speech parameters vary significantly in their importance in speech replication. For example, if one of the energy parameters is altered in the transmission process, speech replication will not be significantly affected. However, if pitch information becomes altered, it will likely render the speech replication unintelligible. As used in the art, "pitch" generally refers to the period or frequency of the buzzing of the vocal cords or glottis, "spectrum" generally refers to the frequency dependent properties of the vocal tract, "energy" generally refers to the magnitude or intensity of the speech waveform, "voicing" refers to whether or not the vocal cords are active, and "quantizing" refers to choosing one of a finite number of discrete values to characterize these ordinarily continuous speech parameters. The number of different quantized values (or levels) for a particular speech parameter is set by the number of bits assigned to code that speech parameter. The foregoing terms are well known in the art and commonly used in connection with vocoding (voice-coding).
Vocoders may-be built which operate at rates ranging from 200 to 9600 bits per second, with varying results depending on the bit rate. One skilled in the art will readily note that the quality of reconstructed voice varies depending not only on the bit rate chosen, but also on parameters previously discussed (e.g., pitch period, spectrum bandwidth, energy, voicing, etc.). Typically, as the transmission channel bandwidth narrows, the allowable bit rate will fall accordingly. Consequently, as the allowable bit rate falls, it becomes more difficult to find a data compression scheme that provides clear, intelligible, synthesized speech. Low bit rates further aggravate the problem of digital voice transmission since error-free reception requires a channel encoding scheme that adequately protects the selected parameters from corruption. Accordingly, a scheme must be selected that adequately protects the coded voice data without adding significant overhead, resulting in increased bit rate requirements. In addition, practical communication systems must take into consideration the complexity of the coding scheme since unduly complex coding schemes cannot be substantially executed in real time or using computer processors of reasonable size, speed, complexity and cost. Processor power consumption is also an important consideration since vocoders are frequently used in hand-held and portable apparatus.
As used herein the term "data compression" is intended to refer to the creation of a set of quantized parameters describing the input speech and "de-compression" is intended to refer to the subsequent use of this set of quantized parameters to synthesize a replica of the input speech. Also, as used herein "channel encoding" is referred to as both the encoding and decoding of the compressed speech parameter data for the protection of the data when passed through a transmission channel. The word "vocoder" (voice coder) has been coined in the art to describe an apparatus which performs the aforementioned functions.
While prior art vocoders are used extensively, they suffer from a number of limitations well known in the art, especially when low bit rates are desired. For example, Linear Prediction Coding ("LPC") has been widely used in low bit rate speech coding. In a LPC-based vocoder, each frame is represented by two sets of parameters. One is the spectral parameter set while the other is a set of excitation parameters which include pitch, voicing and energy information. In interactive communication systems, speech data is coded and transmitted on a frame-by-frame basis, where latency is a critical performance measure. The United States government standard vocoder, known as the LPC-10(e), illustrates such vocoders. The LPC-10(e) requires 5 and 6 bits, respectively, to code the (i) energy and (ii) pitch and voicing parameters for a total of 11 bits of information. While the bit requirements of the LPC-10(e) coder evidence a marked improvement over prior systems, room for improvement remains.
Thus, there is a continuing need in the art for an improved compressed voice digital communication system capable of providing a method for further compressing selected excitation parameters.