Sound signals have a high intensity range, that is to say a high dynamic range of up to 120 dB. The background noise of a rural region at night corresponds to approximately 20 dB, whereas a gunshot has a sound level of approximately 140 dB near to where it is produced.
On account of adaptation processes in the human inner ear, in which the so-called outer hair cells play an important part, normal hearing achieves both a high sensitivity at low sound levels and a high tolerance at high sound levels. The sound level is a physical variable which is a measure of the intensity of the sound. Hearing adapts its amplification to the current sound level and is therefore able to cover a large dynamic range of sound levels between sound received as quiet and sound received as loud. Clearly, a large sound level range is compressed to a small perceptible range. Dynamic range compression is a term used in this connection.
When speech is encoded into action potentials of the auditory nerves, the large dynamic range of the sound signals (up to 120 dB) is compressed to the limit dynamic range of the sensory cells or of a neural system (approximately 40 dB).
Speech recognition systems, hearing aids and audio data compression are fields of economic interest. Principles of automatic speech recognition can be gathered from Schukat-Talamazzini, E.G. (1995) “Automatische Spracherkennung” [“Automatic Speech Recognition”], Friedrich Vieweg & Sohn Verlagsgesellschaft, Braunschweig/Wiesbaden, ISBN 3-528-05492-1, Chapters 1 to 3, by way of example.
In a known speech recognition system, a fast Fourier transformation (FFT) is used for spectral analysis of sound signals. The amplitude spectrum obtained is subsequently logarithmized. This clearly corresponds to a dynamic range compression with a logarithmic characteristic curve.
Such a fast Fourier transformation typically uses a time window having a predefined length, which leads to a restricted frequency resolution and temporal resolution. If, as is customary in speech recognition, only the absolute value spectrum is used, the temporal resolution is limited by the length of the time window used. What is problematic when using a time window having a fixedly predefined size is that an error based on the finiteness of the time window is obtained in the event of an alteration of the spectrum after the inverse transformation.
U.S. Pat. No. 3,808,540 discloses a device for reducing the apparent loudness of an output signal in a radio broadcasting system, which device has a frequency-selective gain reducing network.
DE 24 01 816 C2 discloses a circuit arrangement for compressing the dynamic range of an input signal.