1. Field of the Invention
The present invention relates generally to the field of speech coding in communication systems, and more particularly to detecting voice activity in a communications system.
2. Description of Related Art
Modern communication systems rely heavily on digital speech processing in general, and digital speech compression in particular, in order to provide efficient systems. Examples of such communication systems are digital telephony trunks, voice mail, voice annotation, answering machines, digital voice over data links, etc.
A speech communication system is typically comprised of an encoder, a communication channel and a decoder. At one end of a communications link, the speech encoder converts a speech signal which has been digitized into a bit-stream. The bit-stream is transmitted over the communication channel (which can be a storage medium), and is converted again into a digitized speech signal by the decoder at the other end of the communications link.
The ratio between the number of bits needed for the representation of the digitized speech signal and the number of bits in the bit-stream is the compression ratio. A compression ratio of 12 to 16 is presently achievable, while still maintaining a high quality reconstructed speech signal.
A significant portion of normal speech is comprised of silence, up to an average of 60% during a two-way conversation. During silence, the speech input device, such as a microphone, picks up the environment or background noise. The noise level and characteristics can vary considerably, from a quiet room to a noisy street or a fast moving car. However, most of the noise sources carry less information than the speech signal and hence a higher compression ratio is achievable during the silence periods. In the following description, speech will be denoted as "active-voice" and silence or background noise will be denoted as "non-active-voice".
The above discussion leads to the concept of dual-mode speech coding schemes, which are usually also variable-rate coding schemes. The active-voice and the non-active voice signals are coded differently in order to improve the system efficiency, thus providing two different modes of speech coding. The different modes of the input signal (active-voice or non-active-voice) are determined by a signal classifier, which can operate external to, or within, the speech encoder. The coding scheme employed for the non-active-voice signal uses less bits and results in an overall higher average compression ratio than the coding scheme employed for the active-voice signal. The classifier output is binary, and is commonly called a "voicing decision." The classifier is also commonly referred to as a Voice Activity Detector ("VAD").
A schematic representation of a speech communication system which employs a VAD for a higher compression rate is depicted in FIG. 1. The input to the speech encoder 110 is the digitized incoming speech signal 105. For each frame of a digitized incoming speech signal the VAD 125 provides the voicing decision 140, which is used as a switch 145 between the active-voice encoder 120 and the non-active-voice encoder 115. Either the active-voice bit-stream 135 or the non-active-voice bit-stream 130, together with the voicing decision 140 are transmitted through the communication channel 150. At the speech decoder 155 the voicing decision is used in the switch 160 to select the non-active-voice decoder 165 or the active-voice decoder 170. For each frame, the output of either decoders is used as the reconstructed speech 175.
An example of a method and apparatus which employs such a dual-mode system is disclosed in U.S. Pat. No. 5,774,849, commonly assigned to the present assignee and herein incorporated by reference. According to U.S. Pat. No. 5,774,849, four parameters are disclosed which may be used to make the voicing decision. Specifically, the full band energy, the frame low-band energy, a set of parameters called Line Spectral Frequencies ("LSF") and the frame zero crossing rate are compared to a long-term average of the noise signal. While this algorithm provides satisfactory results for many applications, the present inventors have determined that a modified decision algorithm can provide improved performance over the prior art voicing decision algorithms.