Speech signals are usually transmitted with a limited bandwidth in telecommunication systems, such as a GSM (Global System for Mobile Communications) network. The traditional bandwidth for speech signals in such systems is less than 4 kHz (0.3-3.4 kHz) although speech contains frequency components up to 10 kHz. The limited bandwidth results in a poor performance in both quality and intelligibility. Humans perceive better quality and intelligibility if the frequency band of speech signal is wideband, i.e. up to 8 kHz.
Characteristics of noise can vary a lot. Noise can be, for example, quiet office noise, loud car noise, street noise or babble noise (babble of voices, tinkle of dishes, etc.). In addition to different characteristics, noise can be present either around the mobile phone user in the near-end (tx-noise) or around the other party of the conversation at the far-end (rx-noise). The rx-noise corrupts the speech signal and, therefore, the noise becomes also expanded to the high band together with speech. In situations with a high rx-noise level, this is a problem because the noise starts to sound annoying due to artificially generated high frequency components. Tx-noise degrades the intelligibility by masking the received speech signal.
Prior art artificial bandwidth expansion (ABE) solutions suffer from poor performance in noisy situations. One prior ABE solution is described in U.S. patent application Ser. No. 10/341,332 entitled “Method and Apparatus for Artificial Bandwidth Expansion in Speech Processing” assigned to the same assignee as the present application and incorporated herein by reference in its entirety. An advantage of this earlier developed ABE algorithm is that it is considerably more robust with noisy and coded speech. However, there are problems with this algorithm, including the presence of artifacts which degrade the overall naturalness of perceived quality. Sudden changes in the high band of expanded speech can cause audible artifacts. Further, this prior algorithm includes a frequency bandwidth of 0-4 kHz.
Missing frequency components are especially important for speech sounds like fricatives, (for example /s/ and /z/) because a considerable part of the frequency components are located above 4 kHz. The intelligibility of plosives (/t/, /p/ etc.) suffers from the lack of high frequencies as well, even though the main information of these sounds is in lower frequencies. For voiced sounds, the lack of frequencies results mainly in a degraded perceived naturalness. Because the importance of the high frequency components differs among the speech sounds, the generation of the high band of an expanded signal should be performed differently for each group of phonemes.
Thus, there is a need for a robust computational method for the classification of different phoneme groups. Further, there is a need for an improved method that prevents misclassifications and thereby audible artifacts still present in the previous algorithms. Even further, there is a need for an improved system and method for enhanced artificial bandwidth expansion for signal quality improvement.