More than 100,000 patients worldwide with profound hearing loss have received cochlear implants as a clinical treatment to regain partial hearing. In current cochlear implants, most speech coding strategies extract and deliver a small number of temporal envelope cues via pulsatile electrical stimulation.
Unfortunately, cochlear implants are limited in that the patient can only perceive relatively low frequency signals induced by the cochlear implant. Natural speech and music include both relatively high frequency and relatively low frequency components, and existing cochlear implant signal processing techniques do not extract useful information from the relatively high frequency portions of acoustical inputs. As a result, cochlear implants have relatively poor performance in noisy environments and in regard to the perception of music.
In the widely used continuous interleaved sampling (CIS) coding scheme, sounds are split into a few sub-bands, and the slowly varying envelopes are extracted with a half-wave or full-wave rectifier followed by a low-pass filter in each sub-band. This technique provides a signal that can be used to successfully control cochlear implants to enable users to perceive speech in relatively quiet environments. However, the CIS encoding scheme does not extract much useful information from the relatively high frequency portions of acoustical inputs. Other prior art signal processing techniques for cochlear implants have calculated envelopes from the magnitude of the Fast Fourier Transform (FFT) or the Hilbert transform. Again, such techniques do not extract much useful information from the relatively high frequency portions of acoustical inputs.
This issue can be better understood by examining the following sum-of-product model for any given sound signal x(t), as shown in Eq. (1):
                              x          ⁡                      (            t            )                          =                                            ∑                              k                =                1                            N                        ⁢                                          x                k                            ⁡                              (                t                )                                              =                                    ∑                              k                =                1                            N                        ⁢                                                            a                  k                                ⁡                                  (                  t                  )                                            ·                                                c                  k                                ⁡                                  (                  k                  )                                                                                        (        1        )            where k is a sub-band index, xk(t) is the output for each of N sub-bands, ak(t) is a slowly varying envelope, and ck(t) is a higher-frequency carrier. Some type of detection rule is used to determine the product decomposition of each sub-band output (xk(t)=ak(t)·ck(t)) into slowly varying amplitude and higher frequency carrier signals, respectively.
The envelope signal ak(t) can be derived from the amplitude of Fourier transform, or by incoherent demodulations, e.g., half-wave rectification, full-wave rectification, and the Hilbert transform. In current cochlear implants, only the positive envelope signal ak(t) is coded in each sub-band, resulting in significant loss of information contained in the carrier signal or temporal fine structure ck(t).
For example, a detection rule used in existing cochlear implants decomposes each sub-band signal xk(t) into a Hilbert envelope and an associated carrier. This approach begins with the determination of the analytic signal as shown in Eq. (2):{tilde over (x)}k(t)=xk(t)+jH{xk(t)}  (2)where H{xk(t)} is the Hilbert transform of xk(t). The amplitude portion of the signal is non-negative and the real magnitude of the analytic signal is as shown in Eq. (3):ak(t)=|{tilde over (x)}k(t)|.  (3)
The result from Eq. (3) is commonly referred to as the “Hilbert envelope.” The carrier portion of the signal is the remaining uni-modular phase of the analytic signal, as shown in Eq. (4):
                                          c            k                    ⁡                      (            t            )                          =                              cos            ⁢                          {                                                tan                                      -                    1                                                  ⁢                                                      Im                    ⁢                                                                                  ⁢                                                                                            x                          ~                                                k                                            ⁡                                              (                        t                        )                                                                                                  Re                    ⁢                                                                                  ⁢                                                                                            x                          ~                                                k                                            ⁡                                              (                        t                        )                                                                                                        }                                =                      cos            ⁢                                                  ⁢                                                            φ                  k                                ⁡                                  (                  t                  )                                            .                                                          (        4        )            
Thus, in current cochlear implants, only the non-negative and real envelope ak(t) is delivered to the selected stimulating electrode at a fixed stimulation rate. The conventional envelope extraction process eliminates the temporal fine structure cues (cos φk(t)) in each sub-band, yielding a coarse spectral and temporal representation of speech and music sounds. Psychoacoustic experiments have shown that, with a limited number of envelopes, most patients are still able to understand speech relatively well and they can even converse over the phone. However, among the majority of cochlear implant users, the lack of temporal fine structure has led to poor speech recognition in noisy environments, near-chance level of melody recognition, poor Mandarin tone recognition and production, and inability to use ITD (Inter-aural Timing Difference) cues to localize sounds.
The encoding of temporal fine structure in cochlear implants is ultimately restricted by the ability of temporal pitch perception in electrical stimulation. Studies have shown that cochlear implant patients can only perceive stimulated rate variations up to about 1000 Hz. However, the frequency content of the temporal fine structure (cos φk(t)) in speech and music can be up to 10,000 Hz at higher spectral sub-bands and it is not a band-limited signal.
It would therefore be desirable to provide an acoustical signal processing technique that extracts useful information from the frequency content of the temporal fine structure, to provide enhanced implant performance to users of cochlear implants.