Many people's ability to understand speech is dramatically reduced in noisy environments such as restaurants and meeting rooms. This is especially true of the over 50s, and it is one of the first signs of age-related hearing loss, which severely curtails people's ability to interact in normal social situations. This can lead to a sense of isolation that has a profound influence on general lifestyle, and many studies suggest that it may contribute to dementia.
With this type of hearing loss, listeners do not necessarily suffer any degradation to their threshold of hearing; they could understand the speech perfectly well in the absence of the interfering noise. Consequently, many sufferers may be unaware that they have a hearing problem and conventional hearing aids are not very effective. Also, sufferers become more dependent on lip reading, so any technical solution should preferably have a low latency to keep lip sync.
Techniques are known for blind source separation, using independent component analysis (ICA) in combination with a microphone array. In broad terms such techniques effectively act as a beamformer, steering nulls towards unwanted sources. In more detail there in an assumption that the signal sources are statistically independent, and signal outputs are generated which are as independent as possible. The input signals are split into frequency bands and each frequency band is treated independently, and then the results at different frequencies are aligned. This may be done by assuming that when a source is producing power at one frequency it is probably also active at other frequencies. However this approach can suffer from problems if power at one frequency may be correlated with the absence of power at another frequency, for example the voiced and unvoiced parts of speech. This can lead to frequency bands from different sources being swapped in the output channels.
It is also known to employ non-negative matrix factorisation (NMF) to distinguish between speakers. In broad terms this works by learning a dictionary of spectra for each speaker. However whilst this can distinguish between characteristically different voices, such as male and female speakers, it has difficulty with finer distinctions. In addition this approach can introduce substantial latency into the processed signal, making it unsuitable for real time use and thus unsuitable, for example, to assist in listening to a conversation.
Accordingly there is a need for improved techniques for blind source separation. There is a further need for better techniques for assisting listeners with hearing loss.