Audio signals may contain both speech and non-speech components. The speech component contains speech content while the non-speech component may contain, for example, audio contents in the surround channels of a multichannel audio signal. Furthermore, when the audio signal is played back to users, an environmental noise signal may be simultaneously present external to the audio signal. In order to improve user's experiences, it would be desirable to enhance the intelligibility of the speech content contained in the speech component in the presence of interfering sound signals, such as the non-speech component in the audio signal and/or the environmental noise signal external to the audio signal.
As used herein, the term “intelligibility of speech content” refers to an indication of the degree of comprehensibility of the speech content. The term “loudness” refers to a perceptual magnitude corresponding to physical strength of the audio signal. The term “partial loudness” refers to the perceived loudness of the audio signal in the presence of interfering sound signals, such as environmental noise signals. The term “environmental noise signal” refers to a noise signal in an ambient environment external to the audio signal. The term “speech component” refers to a component containing speech content in the audio signal, and the term “non-speech component” refers to a component containing non-speech content in the audio signal.
Some conventional approaches to enhance the intelligibility of the speech content work on the basis of loudness domain processing. In such an approach, the intelligibility of the speech content may be enhanced by controlling partial loudness of the speech component in the audio signal. More specifically, the partial loudness of the speech component is maintained at a reference level of loudness, without taking environmental noise into account. However, there is no mechanism for verifying whether the resulting intelligibility of the speech content is desirable or comfortable to individual users.
It is also known to enhance the intelligibility of the speech content based on excitation domain processing. The intelligibility of the speech content is enhanced by adjusting the audio signal based on the ratio between a speech component and interfering sound signals. Such approach is applicable in scenarios where the internal interfering sound signal is present or where the external interfering sound signal is present. However, this approach does not work when both the non-speech component and the environmental noise signal are present.