A normal human ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane 102 which moves the bones of the middle ear 103 that vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. It includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The cochlea 104 forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid-filled cochlea 104 functions as a transducer to generate electric pulses which are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. To improve impaired hearing, auditory prostheses have been developed. For example, when the impairment is related to operation of the middle ear 103, a conventional hearing aid may be used to provide acoustic-mechanical stimulation to the auditory system in the form of amplified sound. Or when the impairment is associated with the cochlea 104, a cochlear implant with an implanted electrode can electrically stimulate auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along the electrode. Although the following discussion is specific to cochlear implants, some hearing impaired persons are better served when the stimulation electrode is implanted in other anatomical structures. Thus auditory implant systems include brainstem implants, middle brain implants, etc. each stimulating a specific auditory target in the hearing system.
FIG. 1 also shows some components of a typical cochlear implant system where an external microphone provides an audio signal input to an external signal processor 111 in which various signal processing schemes can be implemented. For example, signal processing approaches that are well-known in the field of cochlear implants include continuous interleaved sampling (CIS) digital signal processing, channel specific sampling sequences (CSSS) digital signal processing (as described in U.S. Pat. No. 6,348,070, incorporated herein by reference), spectral peak (SPEAK) digital signal processing, fine structure processing (FSP) and compressed analog (CA) signal processing.
The processed signal is then converted into a digital data format for transmission by external transmitter coil 107 into the implant 108. Besides receiving the processed audio information, the implant 108 also performs additional signal processing such as error correction, pulse formation, etc., and produces a stimulation pattern (based on the extracted audio information) that is sent through an electrode lead 109 to an implanted electrode array 110. Typically, this electrode array 110 includes multiple electrode contacts 112 on its surface that provide selective stimulation of the cochlea 104.
FIG. 2 shows various functional blocks in a typical CI signal processing system using the CIS stimulation strategy. A sound pre-processor 201 includes a pre-emphasis filter 203 that receives an audio signal from a microphone and attenuates strong frequency components in the audio signal below about 1.2 kHz. The sound pre-processor 201 also includes multiple band-pass filters (BPFs) 204 that decompose the audio signal from the pre-emphasis filter 203 into multiple spectral bands. A sound processor 202 includes envelope detectors 205 that extract the slowly-varying envelopes of the spectral band signals, for example, by full-wave rectification and low pass filtering. The sound processor 202 also includes a non-linear (e.g., logarithmic) mapping module 206 that performs compression of the envelopes to fit the patient's perceptual characteristics, and the compressed envelope signals are then multiplied with carrier waveforms by modulators 207 to produce electric stimulation signals in the specific form of non-overlapping biphasic output pulses for each of the stimulation electrodes (EL-1 to EL-n) implanted in the cochlea reflecting the tonotopic neural response of the cochlea 104 along the length of the implanted electrode array 110.
CIS stimulation imposes a fixed stimulation rate on the delivered electrical pulses and therefore cannot represent periodicity components of the sensed audio signal. On the other hand, FSP stimulation (and its variants) does represent the inherent periodicity of sensed audio signals. FSP generates stimulation pulse trains responsive to detection of specific pre-defined signal characteristics such as zero crossing events. But FSP pulse trains after zero crossing events can only be presented in a pre-defined pattern. That means that the time period between the actual zero crossing and the initial pulse of the pulse trains may be different for each zero crossing event, thereby introducing unwanted jitter. In contrast to the case of unwanted signal jitter, U.S. Pat. No. 7,920,923 describes intentionally introducing a random artificial phase jitter component to binaural stimulation signals. This is done to reduce the periodic characteristics of the fine structure component while preserving interaural time difference (ITD) information.
In the specific case of speech in a tonal language, auditory implant stimulation schemes have further additional considerations. Tonal languages are characterized in that a given spoken syllable will have a different meaning depending on its pitch characteristics. For a simplified example, the pitch contours of the four tones of Chinese Mandarin speech are shown in FIG. 3. Tone 1 (T1) has a nearly constant pitch, tone 2 (T2) has pitch that is mostly rising, tone 3 (T3) has pitch that falls and rises, and tone 4 (T4) has pitch that is mostly falling. If pronounced as [ma:], T1 means ‘mother’, T2 means ‘hemp’, T3 means ‘horse’ and T4 means ‘to grumble’. Depending on whether that syllable is spoken by a male, female, or a child, the distance between the horizontal lines on FIG. 3 will typically be 1.2, 0.8 or 0.4 milliseconds.
Pitch is encoded predominantly in the temporal structure of the signal, the fundamental frequency F0 and higher harmonics. FIG. 4 shows narrowband spectrograms and F0 contours of the four tone patterns of “shi” spoken by a female subject where the grayscale indicates energy associated with time (x-axis) and frequency (y-axis), and the thick black lines represent the F0 contours extracted by autocorrelation.