A normal ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane (eardrum) 102, which moves the bones of the middle ear 103 (malleus, incus, and stapes) that vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. It includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The cochlea 104 forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid-filled cochlea 104 functions as a transducer to generate electric pulses which are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. To improve impaired hearing, auditory prostheses have been developed. For example, when the impairment is related to operation of the middle ear 103, a conventional hearing aid may be used to provide acoustic-mechanical stimulation to the auditory system in the form of amplified sound. Or when the impairment is associated with the cochlea 104, a cochlear implant with an implanted electrode contact can electrically stimulate auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along the electrode.
FIG. 1 also shows some components of a typical cochlear implant system which includes an external microphone that provides an audio signal input to an external signal processor 111 where various signal processing schemes can be implemented. The processed signal is then converted into a digital data format, such as a sequence of data frames, for transmission into the implant 108. Besides receiving the processed audio information, the implant 108 also performs additional signal processing such as error correction, pulse formation, etc., and produces a stimulation pattern (based on the extracted audio information) that is sent through an electrode lead 109 to an implanted electrode array 110. Typically, this electrode array 110 includes multiple stimulation contacts 112 on its surface that provide selective stimulation of the cochlea 104.
Current cochlear implant coding strategies are mostly straight-forward sound processing schemes which map the different sound frequency channels onto different locations along the biological frequency map within the cochlea. FIG. 2 shows one example of the processing of a signal using the cochlear implant stimulation (CIS) stimulation strategy. The top of FIG. 2 shows the sound pressure characteristics of a spoken “A” (/ay/) at a sound level of 67.2 dB. The middle waveform in FIG. 2 shows a normal healthy auditory system response. The bottom waveform in FIG. 2 shows a neural response of the auditory nerve fibers under CIS stimulation.
Contemporary coding strategies were developed to code the spectral structure of sounds which provides sufficient cues for speech understanding. However, the complex time-place patterns observed in the intact ear cannot yet be replicated. This is also due to technical limitations as for example the channel crosstalk between electrode channels which imposes strong limitations on electrically evoked neuronal excitation patterns.
The evaluation of sound quality and speech intelligibility for the purposes of a hearing prosthesis is a complex task that is connected to many perceptual factors. The processing of the auditory system from the outer ear to the auditory nerve fibers can be represented in one or more neurograms such as the ones shown in FIG. 2 where the x-axis represents time and the y-axis logarithmically represents center frequency of the auditory nerve fiber. Neurograms can be used to efficiently predict the intelligibility aspects that relate to the first parts of the auditory pathway.
The literature in the field has proposed various speech evaluation tools. Back in 1947, French and Steinberg (Factors Governing the Intelligibility of Speech Sounds, Journal of the Acoustical Society of America, vol. 19, no. 1, pp. 90-119, incorporated herein by reference) proposed an articulation index (AI) to evaluate speech intelligibility of an audio signal purely as a function of the signal-to-noise-ratio (SNR) dependent on a specific threshold of hearing in twenty frequency bands. In each band the chosen SNR is used to model the overall sound quality, which can be adapted to specific hearing losses.
Bondy et al., Predicting Speech Intelligibility from a Population of Neurons, Advances in Neural Information Processing Systems, vol. 16, 2003 (incorporated herein by reference) described a Neural Articulation Index (NAI) as a variation of the AI based on a weighted sum of the SNR of the firing rates in seven frequency bands of a neurogram.
Elhilali et al., A Spectro-Temporal Modulation Index (STMI) for Assessment of Speech Intelligibility, Speech Communication, vol. 41, no. 2, pp. 331-348, 2003 (incorporated herein by reference) described using a Spectro-Temporal Modulation Index to evaluate the quality of an auditory model to spectro-temporal modulations under different distortions such as noise, reverberations etc. and attempted to predict speech intelligibility under the influence of these distortions using simple averaging.
Hines and Harte, Speech Intelligibility from Image Processing, Speech Communication, vol. 52, no. 9, pp. 736-752, 2010 (incorporated herein by reference) proposed using an image processing technique known as Structural Similarity Index Measure (SSIM, or later NSIM—neurogram similarity index measure) developed by Wang et al. Image Quality Assessment: From Error Visibility to Structural Similarity, IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, 2004 (incorporated herein by reference) which regarded neurograms as images and assessed the similarity between them.
Current comparison methods for neurograms (or related representations of auditory perception) such as NI, NIT, STMI, SSIM and NSIM focus on predicting speech intelligibility in the presence of noise and other signal distortions. They try to estimate the overall quality in the neural representation of a given sound. The quality indexes NI, NIT, STMI are based on average properties of neurograms which are too coarse to be effective in capturing perceptual aspects. Also they do not allow for an adequate comparison between different neurograms which is important when designing stimulation strategies. The NSIM by Hines regards neurograms as images and attempts to predict intelligibility by comparing a degraded neurogram with a reference neurogram under normal hearing conditions. All these approaches do not exploit all relevant information coded in the temporal sequence of auditory neuronal spike trains and are inspired by engineering applications which do not necessarily fit the complex framework of human sound perception.