A normal ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane (eardrum) 102, which moves the bones of the middle ear 103, which in turn vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. The cochlea 104 includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The scala tympani forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid filled cochlea 104 functions as a transducer to generate electric pulses that are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. In some cases, hearing impairment can be addressed by an auditory prosthesis system such as a cochlear implant that electrically stimulates auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along an implant electrode. FIG. 1 shows some components of a typical cochlear implant system where an external microphone provides an audio signal input to an external signal processing stage 111 which implements one of various known signal processing schemes. The processed signal is converted by the external signal processing stage 111 into a digital data format, such as a sequence of data frames, for transmission into a receiver processor in an implant housing 108. Besides extracting the audio information, the receiver processor in the implant housing 108 may perform additional signal processing, and produces a stimulation pattern (based on the extracted audio information) that is sent through an electrode lead 109 to an implanted electrode array 112 which penetrates into the cochlea 104 through a surgical opening called a cochleostomy. Typically, this electrode array 112 includes multiple electrode contacts 110 on its surface that deliver the stimulation signals to adjacent neural tissue of the cochlea 104 which the brain of the patient interprets as sound. The individual electrode contacts 110 may be activated sequentially, or simultaneously in one or more contact groups.
Perception of music and prosodic speech cues are still challenging tasks for cochlear implant users. An audio signal, such as speech or music can be decomposed into the signal amplitude and the fine time structure. The envelope of the signal is fluctuating in amplitude over time and may therefore be considered as the amplitude-modulation of the signal. The fine time structure is fluctuating in frequency over time and may be considered equivalent to the frequency-modulated (FM) carrier wave of the signal.
Speech coding strategies for cochlear implants encode acoustic signals into electrical pulses that stimulate the acoustic nerve. Acoustic frequency can be encoded into a varying pulse rate, because different pulse rates are perceived as different in pitch by cochlear implant users.
One common speech coding strategy is the is the so called “continuous-interleaved-sampling strategy” (CIS), as described by Wilson B. S., Finley C. C., Lawson D. T., Wolford R. D., Eddington D. K., Rabinowitz W. M., “Better speech recognition with cochlear implants,” Nature, vol. 352, 236-238 (July 1991), which is hereby incorporated herein by reference. The CIS speech coding strategy samples the signal envelope amplitude modulation (AM) at predetermined time intervals, providing a remarkable level of speech understanding merely by coding the AM of the speech signal. This can be explained, in part, by the fact that auditory neurons phase lock to amplitude modulated (AM) electrical pulse trains (see, for example, Middlebrooks, J. C., “Auditory Cortex Phase Locking to Amplitude-modulated Cochlear Implant Pulse Trains,” J Neurophysiol, 100(1), p. 76-912008, 2008 July, which is hereby incorporated herein by reference). However, both cues, FM and AM, are important for normal hearing subjects (see, for example, Zeng F., Nie K., Stickney G., Kong Y., “Auditory Perception with Slowly-varying Amplitude and Frequency Modulations,” In: D. Pressnitzer, A. de Cheveign'e, S. McAdams, and L. Collet, “Auditory Signal Processing: Physiology, Psychoacoustics, and Models, Springer Verlag, New York, pp. 237-243, 2004, which is hereby incorporated herein by reference). The perception of frequency modulation cues can be disturbed by a simultaneous AM (see, for example: Moore B. C., Skrodzka E., “Detection of Frequency Modulation by Hearing-impaired Listeners: Effects of Carrier Frequency, Modulation Rate, and Added Amplitude Modulation” J Acoust Soc Am, 111(1 Pt 1), p. 327-335, 2002 January, which is hereby incorporated herein by reference). FM Detection Thresholds (FMDTs) significantly worsen in the presence of simultaneous AM in cochlear implant users (see Luo X., Fu Q., “Frequency Modulation Detection with Simultaneous Amplitude Modulation by Cochlear Implant Users,” J Acoust Soc Am, 122(2), p. 1046-1054, 2007, which is hereby incorporated herein by reference), and the fine time structure cues may thus be masked from the cochlear implant user by simultaneous temporal envelope modulation.
Current speech coding strategies code mainly slow varying signal envelope information and do not transmit the fine time structure of a signal. As these strategies code mainly envelope information, they generally do not suffer from the domination of AM over FM.
In contrast, when strategies do code fine time structure, amplitude modulations resulting from unresolved harmonics can interfere with, and partially mask the fine time structure information.
A clinically available coding strategy that transmits fine structure cues is Fine Structure Processing (FSP). In FSP, the fine time structure of low frequency channels is transmitted through Channel Specific Sampling Sequences (CSSS) that start at negative to positive zero crossings of the respective band pass filter output (see U.S. Pat. No. 6,594,525, Zierhofer 2003, which is hereby incorporate by reference herein). The basic idea is to apply a stimulation pattern, where a particular relationship to the center frequencies of the filter channels is preserved, i.e., the center frequencies are represented in the temporal waveforms of the stimulation patterns, and are not fully removed, as in CIS. Each stimulation channel is associated with a particular CSSS, which is a sequence of ultra-high-rate biphasic pulses (typically 5-10 kpps). Each CSSS has a distinct length (number of pulses) and distinct amplitude distribution. The length of a CSSS may be derived, for example, from the center frequency of the associated band pass filter. A CSSS associated with a lower filter channel is longer than a CSSS associated with a higher filter channel. For example, it may be one half of the period of the center frequency. The amplitude distribution may be adjusted to patient specific requirements.
For illustration, two examples for a 6-channel system are shown. In FIG. 2(a), the CSSS's are derived by sampling one half of a period of a sinusoid, whose frequency is equal to the center frequency of the band pass filter (center frequencies at 440 Hz, 696 Hz, 1103 Hz, 1745 Hz, 2762 Hz, and 4372 Hz). Sampling is achieved by means of biphasic pulses at a rate of 10 kpps and a phase duration of 25 μs. For channels #5 and #6, one half of a period of the center frequencies is too short to give space for more than one stimulation pulse, i.e., the “sequences” consist of only one pulse, respectively. Other amplitude distributions may be utilized. For example, in FIG. 2(b), the sequences are derived by sampling one quarter of a sinusoid with a frequency, which is half the center frequency of the band pass filters. These CSSS's have about the same durations as the CSSS's in FIG. 2(a), respectively, but the amplitude distribution is monotonically increasing. Such monotonic distributions might be advantageous, because each pulse of the sequence can theoretically stimulate neurons at sites which cannot be reached by its predecessors.
FIG. 3 (prior art) illustrates an exemplary signal processing scheme of the FSP strategy. The audio signal is first split up into spectral bands by means of a filter bank of band pass filters 301. Each of these spectral bands is then further processed by a zero crossing detector 303 that detects the negative to positive zeros crossings of each spectral band. The CSSS 305 are inserted at the start of the negative to positive zero crossings of their respective band pass filter output. An envelope detector 307 provides the envelopes of band pass time signals, which include unresolved harmonics and are modulated with the difference tones of the harmonics, mainly the fundamental frequency F0. When the CSSS stimulation pulses are weighted 309 with these envelopes, the resulting pulses are undesirably amplitude modulated mainly with F0. This also applies to the frequency bands that are designed to transmit fine time structure, in addition to amplitude cues.