A normal human ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane 102 which moves the bones of the middle ear 103 that vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. It includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The cochlea 104 forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid-filled cochlea 104 functions as a transducer to generate electric pulses which are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. To improve impaired hearing, hearing prostheses have been developed. For example, when the impairment is related to operation of the middle ear 103, a conventional hearing aid may be used to provide acoustic-mechanical stimulation to the auditory system in the form of amplified sound. Or when the impairment is associated with the cochlea 104, a cochlear implant with an implanted electrode can electrically stimulate auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along the electrode. Although the following discussion is specific to cochlear implants, some hearing impaired persons are better served when the stimulation electrode is implanted in other anatomical structures. Thus hearing implant systems include brainstem implants, middle brain implants, etc. each stimulating a specific auditory target in the auditory system.
FIG. 1 also shows some components of a typical cochlear implant system where an external microphone provides an audio signal input to an external implant processor 111 in which various signal processing schemes can be implemented. For example, it is well-known in the field that electrical stimulation at different locations within the cochlea 104 produce different frequency percepts. The underlying mechanism in normal acoustic hearing is referred to as the tonotopic principle. In cochlear implant users, the tonotopic organization of the cochlea has been extensively investigated; for example, see Vermeire et al., Neural tonotopy in cochlear implants: An evaluation in unilateral cochlear implant patients with unilateral deafness and tinnitus, Hear Res, 245(1-2), 2008 Sep. 12 p. 98-106; and Schatzer et al., Electric-acoustic pitch comparisons in single-sided-deaf cochlear implant users: Frequency-place functions and rate pitch, Hear Res, 309, 2014 March, p. 26-35 (both of which are incorporated herein by reference in their entireties). Examples of current signal processing approaches in the field of cochlear implants include continuous interleaved sampling (CIS) digital signal processing, channel specific sampling sequences (CSSS) digital signal processing (as described in U.S. Pat. No. 6,348,070, incorporated herein by reference), advanced combinational encoder (ACE) processing, spectral peak (SPEAK) digital signal processing, fine structure processing (FSP) and compressed analog (CA) signal processing.
Accordingly, the processed audio signal in the external implant processor 111 is converted into a digital data format for transmission by external transmitter coil 107 into an implant stimulator 108. Besides receiving the processed audio information, the implant stimulator 108 also performs additional signal processing such as error correction, pulse formation, etc., and produces stimulation signals (based on the extracted audio information) that are sent through an electrode lead 109 to an implanted electrode array 110. Typically, this electrode array 110 includes multiple electrode contacts 112 on its surface that provide selective stimulation of the cochlea 104.
In existing cochlear implant systems, the electrode contacts 112 are stimulated in a repeating time sequence of stimulation frames. If each stimulation frame uses all the electrode contacts 112, then the stimulation rate needs to be relatively low to accommodate the pulse lengths required to achieve a patient-specific sufficient loudness perception. Another drawback of stimulating all the electrode contacts 112 in a given stimulation frame is the interference between different channels due to overlapping electrical fields, residual charges at the neuron membranes, and higher order processes. There are several different approaches to reducing these negative effects which use a reduced subset of the electrode contacts 112. Channel selection then is performed frame-wise based on instantaneous signal properties such as band pass signal amplitude.
For normal hearing subjects, both the envelope and the fine time structure, are important for localization and speech understanding in noise and reverberant conditions (Zeng, Fan-Gang, et al. “Auditory perception with slowly-varying amplitude and frequency modulations.” Auditory Signal Processing. Springer New York, 2005. 282-290; Drennan, Ward R., et al. “Effects of temporal fine structure on the lateralization of speech and on speech understanding in noise.” Journal of the Association for Research in Otolaryngology 8.3 (2007): 373-383; and Hopkins, Kathryn, and Brian Moore. “The contribution of temporal fine structure information to the intelligibility of speech in noise.” The Journal of the Acoustical Society of America 123.5 (2008): 3710-3710; and all of which are hereby incorporated herein by reference in their entireties).
Older speech coding strategies mainly encode the slowly varying signal envelope information and do not transmit the fine time structure of a signal. One widespread scheme uses what is referred to as an n-of-m approach where only some number n electrode channels with the greatest amplitude are stimulated in a given stimulation frame. This approach is used, for instance, in the ACE and SPEAK strategies by Cochlear Corporation. If, for a given time frame, the amplitude of a specific electrode channel remains higher than the amplitudes of other channels, then that channel will be selected for the whole time frame. Subsequently, the number of electrode channels that are available for coding information is reduced by one, which results in a clustering of stimulation pulses.
In the CIS signal processing strategy, the signal processor only uses the band pass signal envelopes for further processing, i.e., they contain the entire stimulation information. For each electrode channel, the signal envelope is represented as a sequence of biphasic pulses at a constant repetition rate. A characteristic feature of CIS is that the stimulation rate is equal for all electrode channels and there is no relation to the center frequencies of the individual channels. It is intended that the pulse repetition rate is not a temporal cue for the patient (i.e., it should be sufficiently high so that the patient does not perceive tones with a frequency equal to the pulse repetition rate). The pulse repetition rate is usually chosen at greater than twice the bandwidth of the envelope signals (based on the Nyquist theorem). The stimulation pulses are applied in a strictly non-overlapping sequence. Thus, as a typical CIS-feature, only one electrode channel is active at a time and the overall stimulation rate is comparatively high. For example, assuming an overall stimulation rate of 18 kpps and a 12 channel filter bank, the stimulation rate per channel is 1.5 kpps. Such a stimulation rate per channel usually is sufficient for adequate temporal representation of the envelope signal. The maximum overall stimulation rate is limited by the minimum phase duration per pulse. The phase duration cannot be arbitrarily short because, the shorter the pulses, the higher the current amplitudes have to be to elicit action potentials in neurons, and current amplitudes are limited for various practical reasons. For an overall stimulation rate of 18 kpps, the phase duration is 27 μs, which is near the lower limit.
The Fine Structure Processing (FSP) strategy by Med-El uses CIS in higher frequency channels, and uses fine structure information present in the band pass signals in the lower frequency, more apical electrode channels. In FSP, the fine time structure of low frequency channels is transmitted through Channel Specific Sampling Sequences (CSSS) that start at negative to positive zero crossings of the respective band pass filter output (see U.S. Pat. No. 6,594,525, which is incorporated herein by reference). The basic idea of FSP is to apply a stimulation pattern, where a particular relationship to the center frequencies of the filter channels is preserved, i.e., the center frequencies are represented in the temporal waveforms of the stimulation patterns, and are not fully removed, as is done in CIS. Each stimulation channel is associated with a particular CSSS, which is a sequence of ultra-high-rate biphasic pulses (typically 5-10 kpps). Each CSSS has a distinct length (number of pulses) and distinct amplitude distribution. The length of a CSSS may be derived, for example, from the center frequency of the associated band pass filter. A CSSS associated with a lower filter channel is longer than a CSSS associated with a higher filter channel. For example, it may be one half of the period of the center frequency. The amplitude distribution may be adjusted to patient specific requirements. Typically CSSS sequences are applied on up to 3 of the most apical electrode channels, covering the frequency range up to 200 or 330 Hz. The FSP arrangement is described further in Hochmair I, Nopp P, Jolly C, Schmidt M, Schöβer H, Garnham C, Anderson I, MED-EL Cochlear Implants: State of the Art and a Glimpse into the Future, Trends in Amplification, vol. 10, 201-219, 2006, which is incorporated herein by reference.
For illustration, FIG. 2A-2B show two examples of CSSS for a 6-channel system. In FIG. 2A, the CSSS's are derived by sampling one half of a period of a sinusoid whose frequency is equal to the center frequency of the band pass filter (center frequencies at 440 Hz, 696 Hz, 1103 Hz, 1745 Hz, 2762 Hz, and 4372 Hz). Sampling is achieved by means of biphasic pulses at a rate of 10 kpps and a phase duration of 25 μs. For Channels 5 and 6, one half of a period of the center frequencies is too short to give space for more than one stimulation pulse, i.e., the “sequences” consist of only one pulse, respectively. Other amplitude distributions may be utilized. For example, in FIG. 2B, the sequences are derived by sampling one quarter of a sinusoid with a frequency, which is half the center frequency of the band pass filters. These CSSS's have about the same durations as the CSSS's in FIG. 2A, respectively, but the amplitude distribution is monotonically increasing. Such monotonic distributions might be advantageous, because each pulse of the sequence can theoretically stimulate neurons at sites which cannot be reached by its predecessors.
FIG. 3 illustrates a typical signal processing implementation of the FSP coding strategy. A Preprocessor Filter Bank 301 processes an input sound signal to generate band pass signals that each represent a band pass channel defined by an associated band of audio frequencies. The output of the Preprocessor Filter Bank 301 goes to an Envelope Detector 302 that extracts band pass envelope signals reflecting time varying amplitude of the band pass signals which includes unresolved harmonics and are modulated with the difference tones of the harmonics, mainly the fundamental frequency F0, and to a Stimulation Timing Module 303 that generates stimulation timing signals reflecting the temporal fine structure features of the band pass signals. For FSP, the Stimulation Timing Module 303 detects the negative to positive zero crossings of each band pass signal and in response starts a CSSS as a stimulation timing signal. A Pulse Generator 304 uses the band pass envelope signals and the stimulation timing signals to produce the electrode stimulation signals for the electrode contacts in the implant 305.
The FS4 coding strategy differs from FSP in that up to 4 apical channels can have their fine structure information used. In FS4-p, stimulation pulse sequences can be delivered in parallel on any 2 of the 4 FSP electrode channels. With the FSP and FS4 coding strategies, the fine structure information is the instantaneous frequency information of a given electrode channel, which may provide users with an improved hearing sensation, better speech understanding and enhanced perceptual audio quality. See, e.g., U.S. Pat. No. 7,561,709; Lorens et al. “Fine structure processing improves speech perception as well as objective and subjective benefits in pediatric MED-EL COMBI 40+ users.” International journal of pediatric otorhinolaryngology 74.12 (2010): 1372-1378; and Vermeire et al., “Better speech recognition in noise with the fine structure processing coding strategy.” ORL 72.6 (2010): 305-311; all of which are incorporated herein by reference in their entireties.
FSP and FS4 are the sole commercially available coding strategies that code the temporal fine structure information. Although they have be shown to perform significantly better than e.g. CIS in many hearing situations, there are some other hearing situations in which no significant benefit has been found so far over CIS-like envelope-only coding strategies, in particular with regard to localization and speech understanding in noisy and reverberant conditions.
Temporal fine structure might be more affected by noise than the envelope is. It might be beneficial to use fine structure stimulation depending, for example, on the signal of noise ratio or on the dynamic reverberation ratio. In existing coding strategies, the use of the temporal fine structure is adapted in a post-surgical fitting session and is not adaptive to the signal to noise ratio.
In addition to the specific processing and coding approaches discussed above, different specific pulse stimulation modes are possible to deliver the stimulation pulses with specific electrodes—i.e. mono-polar, bi-polar, tri-polar, multi-polar, and phased-array stimulation. And there also are different stimulation pulse shapes—i.e. biphasic, symmetric triphasic, asymmetric triphasic pulses, or asymmetric pulse shapes. These various pulse stimulation modes and pulse shapes each provide different benefits; for example, higher tonotopic selectivity, smaller electrical thresholds, higher electric dynamic range, less unwanted side-effects such as facial nerve stimulation, etc.
Binaural stimulation has long been used in hearing aids, but it has only recently become common in hearing implants such as cochlear implants (CI). For cochlear implants, binaural stimulation requires a bilateral implant system with two implanted electrode arrays, one in each ear. The incoming left and right side acoustic signals are similar to those in hearing aids and may simply be the output signals of microphones located in the vicinity of the left and right ear, respectively.
Bilateral cochlear implants provide the benefits of two-sided hearing which can allow a listener to localize sources of sound in the horizontal plane. That requires information from both ears such as interaural level differences (ILDs) and interaural time differences (ITDs). This is discussed further, for example, in Macpherson, E. A, and Middlebrooks, J. C., Listener Weighting Of Cues For Lateral Angle: The Duplex Theory Of Sound Localization Revisited, J. Acoust. Soc. Am. 111, 2219-3622, 2002, which is incorporated herein by reference. An ITD is a relative time shift between signals arriving at the left and right ear which is caused by different times for the signal to reach each ear when the source of sound is not within the median plane. An ILD is a similar difference in sound levels of signals entering the ears. Two-sided hearing also is known to make speech easier to understand in noise, and again the perception of ITD plays a pivotal role therein. This is explained more fully, for example, in Bronkhorst, A. W., and Plomp, R., The Effect Of Head-Induced Interaural Time And Level Differences On Speech Intelligibility In Noise, J. Acoust. Soc. Am. 83, 1508-1516, 1988, which is incorporated herein by reference.
Complex room sound situations (e.g. echoes) impede sound localization performance in bilateral cochlear implant systems. The room acoustic signals that arrive at a listener's two ears are characterized by a change in interaural coherence (e.g., Faller et al., “Source localization in complex listening situations: Selection of binaural cues based on interaural coherence,” The Journal of the Acoustical Society of America 116.5 (2004): 3075-3089; incorporated herein by reference in its entirety). The onset of a sound emitted from a nearby sound source may have a high interaural correlation, whereas later sound components may be overlaid by echoes from different directions and may show little or no interaural correlation.
Basic psychoacoustic experiments (Monaghan et al., “Factors affecting the use of envelope interaural time differences in reverberation,” The Journal of the Acoustical Society of America 133.4 (2013): 2288-2300; incorporated herein by reference in its entirety) have shown that the access to signal components with high interaural correlation may be beneficial to stream segregation in the normal-hearing. But existing bilateral cochlear implant systems do not implement methods to enhance sound localization performance.
U.S. Patent Publication 20080319509 describes a method to improve ITD perception which reduces periodic characteristics of the signal. Single coding strategy concepts such as the FS4 strategy are able to code ITDs if the latter are present in the corresponding band-pass signals (see U.S. Pat. Nos. 8,798,758 and 7,283,876, both of which are incorporated herein by reference in their entireties). Other stimulation concepts also have been shown to transmit ITDs, for example using peak-derived timing as described in U.S. Pat. No. 7,310,558, which is incorporated herein by reference in its entirety. Nevertheless, none of the known described implementations considers that ITDs might be smeared by the presence of echo or other disturbing secondary sound sources, and therefore they code both valid and invalid ITDs with equal weight.