With respect to speech communication, background noise can include passing motorists, overhead aircraft, babble noise such as restaurant/café type noises, music, and many other audible noises. Cellular telephone technology brings the ease of communicating anywhere a wireless signal can be received and transmitted. However, the downside with the so called “cellular-age” is that phone conversations may no longer be private or in an area where communication is even feasible. For example, if a cell phone rings and the user answers it, speech communication is effectuated whether the user is in a quiet park or near a noisy jackhammer. Thus, the effects of background noise are a major concern for cellular phone users and providers.
Classification is an important tool in speech processing. Typically, the speech signal is classified into a number of different classes, for among other reasons, to place emphasis on perceptually important features of the signal during encoding. When the speech is clean or free from background noise, robust classification (i.e., low probability of misclassifying frames of speech) is more readily realized. However, as the level of background noise increases, efficiently and accurately classifying the speech becomes a problem.
In the telecommunication industry, speech is digitized and compressed per ITU (International Telecommunication Union) standards, or other standards such as wireless GSM (global system for mobile communications). There are many standards depending upon the amount of compression and application needs. It is advantageous to highly compress the signal prior to transmission because as the compression increases, the bit rate decreases. This allows more information to transfer in the same amount of bandwidth thereby saving bandwidth, power and memory. However, as the bit rate decreases, a faithful reproduction of the speech becomes increasingly more difficult. For example, for telephone application (speech signal with frequency bandwidth of around 3.3 kHz) digital speech signal is typically 16 bits linear or 128 kbits/s. ITU-T standard G.711 is operating at 64 Kbits/s or half of the linear PCM (pulse coding modulation) digital speech signal. The standards continue to decrease in bit rate as demands for bandwidth rise (e.g., G.726 is 32 kbits/s; G.728 is 16 kbits/s; G.729 is 8 kbits/s). A standard is currently under development that will decrease the bit rate even lower to 4 kbits/s.
Typically, speech is classified based on a set of parameters, and for those parameters, a threshold level is set for determining the appropriate class. When background noise is in the environment (e.g., additive speech and noise at the same time), the parameters derived for classification typically overlay or add due to the noise. Present solutions include estimating the level of background noise in a given environment and, depending on that level, varying the thresholds. One problem with these techniques is that the control of the thresholds adds another dimension to the classifier. This increases the complexity of adjusting the thresholds and finding an optimal setting for all noise levels is not generally practical.
For instance, a commonly derived parameter is pitch correlation, which relates to how periodic the speech is. Even in highly voiced speech, such as the vowel sound “a”, when background noise is present, the periodicity appears to be much less due to the random character of the noise.
Complex algorithms are known in the art which purport to estimate parameters based on a reduced noise signal. In one such algorithm, for example, a complete noise compression algorithm is run on a noise-contaminated signal. The parameters are then estimated on the reduced noise signal. However, these algorithms are very complex and consume power and memory from the digital signal processor (DSP).
Accordingly, there is a need for a less complex method for speech classification which is useful at low bit rates. In particular, there is a need for an improved method for speech classification whereby the parameters are not influenced by the background noise.