The invention relates to the field of audio signal processing and in particular to the field of detecting and processing speech.
U.S. Patent Application 2002/0173950 discloses a circuit arrangement for improving the intelligibility of audio signals containing speech, in which frequency and/or amplitude components of the audio signal are altered according to certain parameters. The audio signal is amplified by a predetermined factor in a processing section and output through a high-pass filter, while an edge frequency of the high-pass filter may be regulated so that the amplitude of the audio signal after the processing section is equal or proportional to the amplitude of the audio signal before the processing section. This circuit arrangement proposes to attenuate the ground wave of the speech signal, which contributes relatively little to the intelligibility of the speech components therein, yet possesses the greatest energy, while the remaining signal spectrum of the audio signal is correspondingly emphasized. Furthermore, the amplitude of vowels, which have a large amplitude at low frequency, may be reduced to a vowel in the transitional region of a consonant which has a low amplitude at high frequency, in order to reduce so-called “backward masking.” For this, the entire signal is emphasized by the factor. Finally, high-frequency components are emphasized and the low-frequency ground wave is reduced to the same degree so that the amplitude or energy of the audio signal remains unchanged.
U.S. Pat. No. 5,553,151 describes a “forward masking”. Here, weak consonants overlap in time with preceding strong vowels. A relatively fast compressor with an “attack time” of approximately 10 msec and a “release time” of approximately 75 to 150 msec is proposed.
U.S. Pat. No. 5,479,560 discloses dividing an audio signal into several frequency bands and amplifying relatively strongly those frequency bands with large energy and reducing the others. This is proposed because speech includes a succession of phonemes. Phonemes include a plurality of frequencies. These are especially amplified in the region of the resonance frequencies of the mouth and throat. A frequency band with such a spectral peak value is known as a formant. Formants are especially important for recognition of phonemes and, thus, speech. One principle of improving the intelligibility of speech is to amplify the peak values or formants of the frequency spectrum of an audio signal and attenuate the errors coming in between. For an adult man, the fundamental frequency of speech is approximately 60 to 250 Hz. The first four formants assigned are at 500 Hz, 1500 Hz, 2500 Hz, and 3500 Hz.
Such circuit arrangements and procedure make speech contained in an audio signal more understandable than other components contained in the audio signal. But at the same time, signal components not containing speech are also altered or distorted. Another drawback to the methods and circuit arrangements is that these continuously improve or process rigidly fixed speech components, frequency components, or the like. Thus, signal components not containing speech are also altered or distorted at times when the audio signal contains no speech or speech components.
Therefore, there is a need for a technique that processes speech within an audio signal while reducing the altering and distortion of the audio signal component not containing speech.