A normal ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane 102, which moves the bones of the middle ear 103 (malleus, incus, and stapes) that vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. It includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The cochlea 104 forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid-filled cochlea 104 functions as a transducer to generate electric pulses which are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. To improve impaired hearing, hearing prostheses have been developed. For example, when the impairment is related to operation of the middle ear 103, a conventional hearing aid may be used to provide mechanical stimulation to the auditory system in the form of amplified sound. Or when the impairment is associated with the cochlea 104, a cochlear implant with an implanted stimulation electrode can electrically stimulate auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along the electrode.
FIG. 1 also shows some components of a typical cochlear implant system, including an external microphone that provides an audio signal input to an external signal processor 111 where various signal processing schemes can be implemented. The processed signal is then converted into a digital data format, such as a sequence of data frames, for transmission into the implant 108. Besides receiving the processed audio information, the implant 108 also performs additional signal processing such as error correction, pulse formation, etc., and produces a stimulation pattern (based on the extracted audio information) that is sent through an electrode lead 109 to an implanted electrode array 110.
Typically, the electrode array 110 includes multiple electrode contacts 112 on its surface that provide selective stimulation of the cochlea 104. Depending on context, the electrode contacts 112 are also referred to as electrode channels. In cochlear implants today, a relatively small number of electrode channels are each associated with relatively broad frequency bands, with each electrode contact 112 addressing a group of neurons with an electric stimulation pulse having a charge that is derived from the instantaneous amplitude of the signal envelope within that frequency band.
It is well-known in the field that electric stimulation at different locations within the cochlea produce different frequency percepts. The underlying mechanism in normal acoustic hearing is referred to as the tonotopic principle. In cochlear implant users, the tonotopic organization of the cochlea has been extensively investigated; for example, see Vermeire et al., Neural tonotopy in cochlear implants: An evaluation in unilateral cochlear implant patients with unilateral deafness and tinnitus, Hear Res, 245(1-2), 2008 Sep. 12 p. 98-106; and Schatzer et al., Electric-acoustic pitch comparisons in single-sided-deaf cochlear implant users: Frequency-place functions and rate pitch, Hear Res, 309, 2014 March, p. 26-35 (both of which are incorporated herein by reference in their entireties).
In some stimulation signal coding strategies, stimulation pulses are applied at a constant rate across all electrode channels, whereas in other coding strategies, stimulation pulses are applied at a channel-specific rate. Various specific signal processing schemes can be implemented to produce the electrical stimulation signals. Signal processing approaches that are well-known in the field of cochlear implants include continuous interleaved sampling (CIS), channel specific sampling sequences (CSSS) (as described in U.S. Pat. No. 6,348,070, incorporated herein by reference), spectral peak (SPEAK), and compressed analog (CA) processing.
In the CIS strategy, the signal processor only uses the band pass signal envelopes for further processing, i.e., they contain the entire stimulation information. For each electrode channel, the signal envelope is represented as a sequence of biphasic pulses at a constant repetition rate. A characteristic feature of CIS is that the stimulation rate is equal for all electrode channels and there is no relation to the center frequencies of the individual channels. It is intended that the pulse repetition rate is not a temporal cue for the patient (i.e., it should be sufficiently high so that the patient does not perceive tones with a frequency equal to the pulse repetition rate). The pulse repetition rate is usually chosen at greater than twice the bandwidth of the envelope signals (based on the Nyquist theorem).
In a CIS system, the stimulation pulses are applied in a strictly non-overlapping sequence. Thus, as a typical CIS-feature, only one electrode channel is active at a time and the overall stimulation rate is comparatively high. For example, assuming an overall stimulation rate of 18 kpps and a 12 channel filter bank, the stimulation rate per channel is 1.5 kpps. Such a stimulation rate per channel usually is sufficient for adequate temporal representation of the envelope signal. The maximum overall stimulation rate is limited by the minimum phase duration per pulse. The phase duration cannot be arbitrarily short because, the shorter the pulses, the higher the current amplitudes have to be to elicit action potentials in neurons, and current amplitudes are limited for various practical reasons. For an overall stimulation rate of 18 kpps, the phase duration is 27 μs, which is near the lower limit.
The Fine Structure Processing (FSP) strategy by Med-El uses CIS in higher frequency channels, and uses fine structure information present in the band pass signals in the lower frequency, more apical electrode channels. In the FSP electrode channels, the zero crossings of the band pass filtered time signals are tracked, and at each negative to positive zero crossing, a Channel Specific Sampling Sequence (CSSS) is started. Typically CSSS sequences are applied on up to 3 of the most apical electrode channels, covering the frequency range up to 200 or 330 Hz. The FSP arrangement is described further in Hochmair I, Nopp P, Jolly C, Schmidt M, Schößer H, Garnham C, Anderson I, MED-EL Cochlear Implants: State of the Art and a Glimpse into the Future, Trends in Amplification, vol. 10, 201-219, 2006, which is incorporated herein by reference. The FS4 coding strategy differs from FSP in that up to 4 apical channels can have their fine structure information used. In FS4-p, stimulation pulse sequences can be delivered in parallel on any 2 of the 4 FSP electrode channels. With the FSP and FS4 coding strategies, the fine structure information is the instantaneous frequency information of a given electrode channel, which may provide users with an improved hearing sensation, better speech understanding and enhanced perceptual audio quality. See, e.g., U.S. Pat. No. 7,561,709; Lorens et al. “Fine structure processing improves speech perception as well as objective and subjective benefits in pediatric MED-EL COMBI 40+ users.” International journal of pediatric otorhinolaryngology 74.12 (2010): 1372-1378; and Vermeire et al., “Better speech recognition in noise with the fine structure processing coding strategy.” ORL 72.6 (2010): 305-311; all of which are incorporated herein by reference in their entireties.
Many cochlear implant coding strategies use what is referred to as an n-of-m approach where only some number n electrode channels with the greatest amplitude are stimulated in a given sampling time frame. If, for a given time frame, the amplitude of a specific electrode channel remains higher than the amplitudes of other channels, then that channel will be selected for the whole time frame. Subsequently, the number of electrode channels that are available for coding information is reduced by one, which results in a clustering of stimulation pulses. Thus, fewer electrode channels are available for coding important temporal and spectral properties of the sound signal such as speech onset.
In addition to the specific processing and coding approaches discussed above, different specific pulse stimulation modes are possible to deliver the stimulation pulses with specific electrodes—i.e. mono-polar, bi-polar, tri-polar, multi-polar, and phased-array stimulation. And there also are different stimulation pulse shapes—i.e. biphasic, symmetric triphasic, asymmetric triphasic pulses, or asymmetric pulse shapes. These various pulse stimulation modes and pulse shapes each provide different benefits; for example, higher tonotopic selectivity, smaller electrical thresholds, higher electric dynamic range, less unwanted side-effects such as facial nerve stimulation, etc.
Fine structure coding strategies such as FSP and FS4 use the zero-crossings of the band-pass signals to start a channel-specific sampling sequence (CSSS) pulse sequences for delivery to the corresponding electrode contact. Zero-crossings reflect the dominant instantaneous frequency quite robustly in the absence of other spectral components. But in the presence of higher harmonics and noise, problems can arise. See, e.g., WO 2010/085477 and Gerhard, David, Pitch extraction and fundamental frequency: History and current techniques. Regina: Department of Computer Science, University of Regina, 2003; both incorporated herein by reference in their entireties.
FIG. 2 shows a sample spectrogram for a sample of clean speech including estimated instantaneous frequencies for Channels 1 and 3 as reflected by evaluating the signal zero-crossings. The horizontal black dashed lines show the channel frequency boundaries—Channels 1, 2, 3 and 4 range between 100, 198, 325, 491 and 710 Hz, respectively. It can be seen in FIG. 2 that during periods of a single dominant harmonic in a given frequency channel, the estimate of the instantaneous frequency is smooth and robust; for example, in Channel 1 from 1.6 to 1.9 seconds, or in Channel 3 from 3.4 to 3.5 seconds. When additional frequency harmonics are present in a given channel, or when the channel signal intensity is low, the instantaneous frequency estimation becomes inaccurate, and, in particular, the estimated instantaneous frequency may even leave the frequency range of the channel.
Gerhard 2003 cited above gives an overview of algorithms that can be used to estimate the fundamental frequency. These algorithms include time-domain methods, frequency-domain methods and statistical frequency-domain methods. Most of them are computationally too expensive to be usable in real life and/or cannot guarantee robustness. Vandali et al. “Pitch ranking ability of cochlear implant recipients: A comparison of sound-processing strategies.” The Journal of the Acoustical Society of America 117.5 (2005): 3126-3138 (incorporated herein by reference in its entirety) uses positive peaks instead of the zero-crossings to preserve the fine structure information. But peak detection has the same problems as the zero-crossings technique when more than one harmonic and/or noise occurs in a given frequency channel.
In WO 2010/085477, the filter bank resolution is enhanced to resolve the low frequency harmonics. As a result, the estimation of the instantaneous frequency is robust when using the zero-crossing approach. A signal-dependent algorithm also is used to select channels of the high-resolution bands, which are then sent to the implant.
In [I82_2013], the dominant frequency in a channel is estimated by skipping to fast zero-crossing. The time differences of the residual zero-crossing are inverted and smoothed to get an estimation of the dominant frequency.