Reverberation is a major problem for hearing impaired persons. The reason is that, in addition to the missing spectral cues for speech intelligibility from the broadening of the auditory filters (i.e. the reduced spectral discrimination ability of the impaired ear, due to defect outer hair cells, resulting in less sharply tuned auditory filters in the impaired ear), the temporal cues also are mitigated by the reverberation. Onsets, speech pauses etc. are no longer perceivable. Thus, severe intelligibility reductions as well as comfort decreases occur.
From a technical point of view, reverberation is a filtering (convolution) of the clean signal, for example a speech signal, with the room impulse response (RIR) from the speaker to the hearing impaired person. These room impulse responses tend to be very long, in the order of several hundred milliseconds up to several seconds for large cathedrals or main train stations. The long RIR thus slurs the speech pauses.
The immediate technical solution therefore is so called ‘de-convolution’, i.e. the estimation and inversion of the RIR, with which the reverberated signal arriving at the Hearing Instrument (HI) can get filtered and thus perfectly restored to the original clean or ‘dry’ signal. From a mathematical point of view, deconvolution or inversion of a filter response is a well known process. The problems lie in the following points:    a.) The fact that the inversion of a real RIR generates an acausal filter, i.e. one which needs information from the future. This can in principle only be eliminated by introducing an appropriate delay into the system, which therefore would have to be several hundred milliseconds long at least.    b.) Estimation of the correct RIR (or directly the inverted version of it).
Concerning point a.), even when only the first part of the RIR (the one with the highest energies) gets corrected for, far too long delays for hearing instrument (HI) purposes would be required.
Even more important though is the correct estimation of the RIR (point b.), which is considered a hard problem in the field to solve, and no completely satisfying and useful solutions exist.
For these reasons, instead of deconvolution other approaches are used for dereverberation. One known solution uses multiple microphones or a beamformer to dereverberate the signal. This, however, is of limited use in large rooms, where the sound field is very diffuse.
Another known solution tries to dereverberate by transforming the signal first into cepstral domain, where the (estimated) RIR can simply get subtracted, before transforming back into the linear time domain. These solutions are computationally not cheap either, and also require a significant group delay. Also, they are not very robust.
A novel solution was presented in K. Lebart et al., acta acustica vol. 87 (2001), p. 359-366. The solution is a method based on spectral subtraction. The principle is that the RIR is modeled to be a zero mean Gaussian noise which decays exponentially:h(t)=b(t)·e−Δt for t≧0 andh(t)=0 for t<0  (1)
In the above equation, b(t) denotes a zero mean Gaussian function and
      Δ    =                  3        ·                  ln          ⁡                      (            10            )                                      T        r              ,Tr being the reverberation time, i.e. the time after which the reverberation energy decayes by 60 dB.
The reverberation energy at any time t can thus be estimated byPrr(t,f)=e−2ΔT·Pxx(t−T,f)  (2)where Pxx(t,f) is the power spectral density of a signal x(n). T is an (arbitrary) delay.
In other words, the reverberation power at any time t is equal to the signal power of the speaker at an earlier time t−T, and attenuated by the exponential term e−2ΔT.
One can now consider the ratio between the current received signal power and the estimated reverberation signal power as a ‘Signal-to-reverberation-Noise Ratio (SNR)’ and form a spectral subtraction filter like gain function from it. However, musical noise artifacts may get produced and have to be avoided by additional means like averaging or setting a spectral floor.
An algorithm based on these findings is of lower complexity than above mentioned direct dereverberation or cepstral methods, but is still computational expensive. In particular, the reverberation time Tr, which is required in order to generate the exponential term in Eq. (2) for the reverberation power estimation, is hard to calculate: First, speech pauses are detected (which is rather difficult in a highly reverberated signal). During speech pauses, the exponential decay corresponds to a linear negative slope on a logarithmic scale. Then, within these signal segments the slope of the smoothed signal power envelope on a dB scale is extracted by linear regression, another quite expensive operation. Further averaging of the found slopes are used to come up with an improved estimate. From the slope estimate and the known sample time, Tr can get extracted.
Next to being computationally expensive, the above described method also lacks a certain amount of robustness. This is, among other reasons, due to uncertainties in detecting speech pauses.