Hearing impairment is one of the most prevalent chronic health conditions, affecting approximately 500 million people world-wide. Although the most common type of hearing impairment is conductive hearing loss, resulting in an increased frequency-selective hearing threshold, many hearing impaired persons additionally suffer from sensorineural hearing loss, which is associated with damage of hair cells in the cochlea. Due to the loss of temporal and spectral resolution in the processing of the impaired auditory system, this type of hearing loss leads to a reduction of speech intelligibility in noisy acoustic environments.
In the so-called “cocktail party” environment, where a target sound is mixed with a number of acoustic interferences, a normal hearing person has the remarkable ability to selectively separate the sound source of interest from the composite signal received at the ears, even when the interferences are competing speech sounds or a variety of non-stationary noise sources (see e.g. Cherry, “Some experiments on the recognition of speech, with one and with two ears”, J. Acoust. Soc. Amer., vol. 25, no. 5, pp. 975-979, September 1953; Haykin & Chen, “The Cocktail Party Problem”, Neural Computation, vol. 17, no. 9, pp. 1875-1902, September 2005).
One way of explaining auditory sound segregation in the “cocktail party” environment is to consider the acoustic environment as a complex scene containing multiple objects and to hypothesize that the normal auditory system is capable of grouping these objects into separate perceptual streams based on distinctive perceptual cues. This process is often referred to as auditory scene analysis (see e.g. Bregman, “Auditory Scene Analysis”, MIT Press, 1990).
According to Bregman, sound segregation consists of a two-stage process: feature selection/calculation and feature grouping. Feature selection essentially involves processing the auditory inputs to provide a collection of favorable features (e.g. frequency-selective, pitch-related, temporal-spectral like features). The grouping process, on the other hand, is responsible for combining the similar elements according to certain principles into one or more coherent streams, where each stream corresponds to one informative sound source. Grouping processes may be data-driven (primitive) or schema-driven (knowledge-based). Examples of primitive grouping cues that may be used for sound segregation include common onsets/offsets across frequency bands, pitch (fundamental frequency) and harmonically, same location in space, temporal and spectral modulation, pitch and energy continuity and smoothness.
In noisy acoustic environments, sensorineural hearing impaired persons typically require a signal-to-noise ratio (SNR) up to 10-15 dB higher than a normal hearing person to experience the same speech intelligibility (see e.g. Moore, “Speech processing for the hearing-impaired: successes, failures, and implications for speech mechanisms”, Speech Communication, vol. 41, no. 1, pp. 81-91, August 2003). Hence, the problems caused by sensorineural hearing loss can only be solved by either restoring the complete hearing functionality, i.e. completely modeling and compensating the sensorineural hearing loss using advanced non-linear auditory models (see e.g. Bondy, Becker, Bruce, Trainor & Haykin,“A novel signal-processing strategy for hearing-aid design: neurocompensation”, Signal Processing, vol. 84, no. 7, pp. 1239-1253, July 2004; US2005/069162, “Binaural adaptive hearing aid”), and/or by using signal processing algorithms that selectively enhance the useful signal and suppress the undesired background noise sources.
Many hearing instruments currently have more than one microphone, enabling the use of multi-microphone speech enhancement algorithms. In comparison with single-microphone algorithms, which can only use spectral and temporal information, multi-microphone algorithms can additionally exploit the spatial information of the speech and the noise sources. This generally results in a higher performance, especially when the speech and the noise sources are spatially separated. The typical microphone array in a (monaural) multi-microphone hearing instrument consists of closely spaced microphones in an endfire configuration. Considerable noise reduction can be achieved with such arrays, at the expense however of increased sensitivity to errors in the assumed signal model, such as microphone mismatch, look direction error and reverberation.
Many hearing impaired persons have a hearing loss in both ears, such that they need to be fitted with a hearing instrument at each ear (i.e. a so-called bilateral or binaural system). In many bilateral systems, a monaural system is merely duplicated and no cooperation between the two hearing instruments takes place. This independent processing and the lack of synchronization between the two monaural systems typically destroys the binaural auditory cues. When these binaural cues are not preserved, the localization and noise reduction capabilities of a hearing impaired person are reduced.