Reverberation severely degrades speech intelligibility for cochlear implant (CI) users. The ideal reverberant mask (IRM), a binary mask for reverberation suppression which is computed using signal-to-reverberant ratio, was found to yield substantial intelligibility gains for CI users even in highly reverberant environments (e.g., T60=1.0 s). Motivated by the intelligibility improvements obtained from IRM, a monaural blind channel-selection criterion for reverberation suppression is proposed. The proposed channel-selection strategy is blind, meaning that prior knowledge of neither the room impulse response (RIR) nor the anechoic signal is required. By the use of a residual signal obtained from linear prediction analysis of the reverberant signal, the residual-to-reverberant ratio (RRR) of individual frequency channels was employed as the channel-selection criterion. In each frame, the channels with RRR less than an adaptive threshold were retained while the rest were zeroed out. Performance of the proposed strategy was evaluated via intelligibility listening tests conducted with CI users in simulated rooms with two reverberation times of 0.6 and 0.8 s. The results indicate significant intelligibility improvements in both reverberant conditions (over 30 and 40 percentage points in T60=0.6 and 0.8 s, respectively). The improvement is comparable to that obtained with the IRM strategy.
Several speech de-reverberation algorithms have been proposed in order to improve the quality or intelligibility of reverberant speech (e.g., see Huang et al., 2007; Naylor and Gaubitch, 2010). However, little is known about the effectiveness of such algorithms in improving speech intelligibility for CI users. In addition, existing dereverberation algorithms are computationally expensive, which makes their integration into CIs a formidable task.
Regardless of the speech coding strategy used in CI devices, most CI users are able to achieve open-set speech recognition scores of 80% or higher in quiet anechoic conditions. However, current speech coding strategies in CIs perform poorly in the presence of noise or reverberation. For example, advanced combination encoder (ACE) which is one of the most commonly used speech coding strategies in CI processors, selects only a subset of channels (8-12) for stimulation at each analysis window. It operates based on the principle that only peaks of speech in the short-term spectrum are sufficient for speech identification. Therefore, during the unvoiced segments (e.g., stops) of the reverberant utterance, where the reverberation overlap-masking effect dominates, the ACE strategy mistakenly selects the channels containing reverberant energy, since those channels have the highest energy.
Binary masking refers to algorithms that decompose the signal into T-F units and select those units satisfying a given criterion (e.g., SNR>0 dB, for noise suppression), while discarding the rest by applying a binary mask to the units of the decomposed signal, i.e., the mask for a given T-F unit is set to 0 if it does not satisfy a given criterion or is set to 1 if it satisfies the criterion. Binary masks have been widely used for different speech enhancement as well as sound separation applications resulting in gains in intelligibility and quality of the processed noisy speech. Use of the binary masks for dereverberation is attractive as it does not rely on the inversion of the RIR. Thus there is a need for a method that can improve the intelligibility of reverberant speech for cochlear implant users.