A normal ear transmits sounds as shown in FIG. 1 through the outer ear 101 to the tympanic membrane (eardrum) 102, which moves the bones of the middle ear 103 (malleus, incus, and stapes) that vibrate the oval window and round window openings of the cochlea 104. The cochlea 104 is a long narrow duct wound spirally about its axis for approximately two and a half turns. It includes an upper channel known as the scala vestibuli and a lower channel known as the scala tympani, which are connected by the cochlear duct. The cochlea 104 forms an upright spiraling cone with a center called the modiolar where the spiral ganglion cells of the acoustic nerve 113 reside. In response to received sounds transmitted by the middle ear 103, the fluid-filled cochlea 104 functions as a transducer to generate electric pulses which are transmitted to the cochlear nerve 113, and ultimately to the brain.
Hearing is impaired when there are problems in the ability to transduce external sounds into meaningful action potentials along the neural substrate of the cochlea 104. To improve impaired hearing, auditory prostheses have been developed. For example, when the impairment is related to operation of the middle ear 103, a conventional hearing aid may be used to provide acoustic-mechanical stimulation to the auditory system in the form of amplified sound. Or when the impairment is associated with the cochlea 104, a cochlear implant with an implanted stimulation electrode can electrically stimulate auditory nerve tissue with small currents delivered by multiple electrode contacts distributed along the electrode.
FIG. 1 also shows some components of a typical cochlear implant system which includes an external microphone that provides an audio signal input to an external signal processor 111 where various signal processing schemes can be implemented. The processed signal is then converted into a digital data format, such as a sequence of data frames, for transmission via coil 107 into the implant 108. Besides receiving the processed audio information, the implant 108 also performs additional signal processing such as error correction, pulse formation, etc., and produces a stimulation pattern (based on the extracted audio information) that is sent through an electrode lead 109 to an implanted electrode array 110. Typically, this electrode array 110 includes multiple electrodes 112 on its surface that provide selective stimulation of the cochlea 104.
In cochlear implants today, a relatively small number of electrodes are each associated with relatively broad frequency bands, with each electrode addressing a group of neurons through a stimulation pulse the charge of which is derived from the instantaneous amplitude of the envelope within that frequency band. In some coding strategies, stimulation pulses are applied at constant rate across all electrodes, whereas in other coding strategies, stimulation pulses are applied at an electrode-specific rate.
Various signal processing schemes can be implemented to produce the electrical stimulation signals. Signal processing approaches that are well-known in the field of cochlear implants include continuous interleaved sampling (CIS) digital signal processing, channel specific sampling sequences (CSSS) digital signal processing (as described in U.S. Pat. No. 6,348,070, incorporated herein by reference), spectral peak (SPEAK) digital signal processing, and compressed analog (CA) signal processing. For example, in the CIS approach, signal processing for the speech processor involves the following steps:                (1) splitting up of the audio frequency range into spectral bands by means of a filter bank,        (2) envelope detection of each filter output signal,        (3) instantaneous nonlinear compression of the envelope signal (map law).According to the tonotopic organization of the cochlea, each stimulation electrode in the scala tympani is associated with a band pass filter of the external filter bank. For stimulation, symmetrical biphasic current pulses are applied. The amplitudes of the stimulation pulses are directly obtained from the compressed envelope signals. These signals are sampled sequentially, and the stimulation pulses are applied in a strictly non-overlapping sequence. Thus, as a typical CIS-feature, only one stimulation channel is active at one time and the overall stimulation rate is comparatively high. For example, assuming an overall stimulation rate of 18 kpps and a 12 channel filter bank, the stimulation rate per channel is 1.5 kpps. Such a stimulation rate per channel usually is sufficient for adequate temporal representation of the envelope signal. The maximum overall stimulation rate is limited by the minimum phase duration per pulse. The phase duration cannot be chosen arbitrarily short, because the shorter the pulses, the higher the current amplitudes have to be to elicit action potentials in neurons, and current amplitudes are limited for various practical reasons. For an overall stimulation rate of 18 kpps, the phase duration is 27 μs, which is near the lower limit. Each output of the CIS band pass filters can roughly be regarded as a sinusoid at the center frequency of the band pass filter which is modulated by the envelope signal. This is due to the quality factor (Q≈3) of the filters. In case of a voiced speech segment, this envelope is approximately periodic, and the repetition rate is equal to the pitch frequency.        
In the existing CIS-strategy, only the envelope signals are used for further processing, i.e., they contain the entire stimulation information. For each channel, the envelope is represented as a sequence of biphasic pulses at a constant repetition rate. A characteristic feature of CIS is that this repetition rate (typically 1.5 kpps) is equal for all channels and there is no relation to the center frequencies of the individual channels. It is intended that the repetition rate is not a temporal cue for the patient, i.e., it should be sufficiently high, so that the patient does not perceive tones with a frequency equal to the repetition rate. The repetition rate is usually chosen at greater than twice the bandwidth of the envelope signals (Nyquist theorem).
Another cochlear implant stimulation strategy that transmits fine time structure information is the Fine Structure Processing (FSP) strategy by Med-El. Zero crossings of the band pass filtered time signals are tracked, and at each negative to positive zero crossing a Channel Specific Sampling Sequence (CSSS) is started. Typically CSSS sequences are only applied on the first one or two most apical channels, covering the frequency range up to 200 or 330 Hz. The FSP arrangement is described further in Hochmair I, Nopp P, Jolly C, Schmidt M, Schöβer H, Garnham C, Anderson I, MED-EL Cochlear Implants: State of the Art and a Glimpse into the Future, Trends in Amplification, vol. 10, 201-219, 2006, which is incorporated herein by reference.
FIG. 2 shows major functional blocks in the signal processing arrangement typical of existing cochlear implant (CI) systems wherein band pass signals containing stimulation timing and amplitude information are assigned to stimulation electrodes. Preprocessor Filter Bank 201 pre-processes an initial acoustic audio signal, e.g., automatic gain control, noise reduction, etc. Each band pass filter in the Preprocessor Filter Bank 201 is associated with a specific band of audio frequencies so that the acoustic audio signal is filtered into some N band pass signals, B1 to BN where each signal corresponds to the band of frequencies for one of the band pass filters.
The band pass signals B1 to BN are input to a Stimulation Pulse Generator 202 which extracts signal specific stimulation information—e.g., envelope information, phase information, timing of requested stimulation events, etc.—into a set of N stimulation event signals S1 to SN, which represent electrode specific requested stimulation events. For example, channel specific sampling sequences (CSSS) may be used as described in U.S. Pat. No. 6,594,525, which is incorporated herein by reference.
Pulse Mapping Module 203 applies a non-linear mapping function (typically logarithmic) to the amplitude of each band-pass envelope. This mapping function typically is adapted to the needs of the individual CI user during fitting of the implant in order to achieve natural loudness growth. This may be in the specific form of functions that are applied to each requested stimulation event signal S1 to SN that reflect patient-specific perceptual characteristics to produce a set of electrode stimulation signals A1 to AM that provide an optimal electric representation of the acoustic signal.
The Pulse Mapping Module 203 controls loudness mapping functions. The amplitudes of the electrical pulses are derived from the envelopes of the assigned band pass filter outputs. A logarithmic function with a form-factor C typically may be applied to stimulation event signals S1 to SN as a loudness mapping function, which generally is identical across all the band pass analysis channels. In different systems, different specific loudness mapping functions other than a logarithmic function may be used, though still just one identical function is applied to all channels to produce the electrode stimulation signals A1 to AM outputs from the Pulse Mapping Module 203.
Patient specific stimulation is achieved by individual amplitude mapping and pulse shape definition in Pulse Shaper 204 which develops the set of electrode stimulation signals A1 to AM into a set of output electrode pulses E1 to EM to the electrodes in the implanted electrode array which stimulate the adjacent nerve tissue.
Background noise reduces speech intelligibility of hearing aid and cochlear implant users. According to Hernandez et al., An Assessment Of Everyday Noises And Their Annoyance, Hearing Review, 2006, 13(7), 16-20 (incorporated herein by reference), 33% of sensate background noise is formed by transient sounds such as computer key strokes, slamming doors, dish clattering, etc., all of which are unpleasant and reduce listening comfort (See also, German Patent DE 102005043314, incorporated herein by reference). The transient noise reduction algorithms in existing hearing aids such as the AntiShock from Unitron Connect and the SoundSmoothing from Siemens have been found to yield an improvement in the listening experience. See DiGiovanni et al., Effects of Transient-Noise Reduction Algorithms on Speech Intelligibility and Ratings of Hearing Aid Users, American Journal of Audiology, first published on Sep. 22, 2011 as doi:10.1044/1059-0889(2011/10-0007), incorporated herein by reference. Transient noise reduction is also sought in other applications. For example, sound quality for car passengers may be improved by reducing the transient road noise created when tires strike an obstruction. See U.S. Pat. No. 7,725,315, incorporated herein by reference.
On the other hand, enhancement of short-duration transient speech features, like consonants or on/offsets of speech, may improve speech perception in certain listening conditions, particularly with regard to low intensities. See: Vandali A. E., Emphasis of Short-duration Acoustic Speech Cues for Cochlear Implant Users, The Journal of the Acoustical Society of America, 2001, 109(5), 2049-2061, doi:10.1121/1.1358300; and Holden L. K., Vandali A. E., Skinner M. W., Fourakis M. S., Holden T. A., Speech Recognition With the Advanced Combination Encoder and Transient Emphasis Spectral Maxima Strategies in Nucleus 24 Recipients, Journal of Speech, Language, and Hearing Research, 2005, 48, 681-701, each of which is incorporated by reference in its entirety. This may also enhance the onset of certain speech features, ultimately yielding increased intelligibility. See Koning R., Wouters J., The Potential of Onset Enhancement for Increased Speech Intelligibility in Auditory Prostheses, J. Acoust. Soc. Am. 132(4), October 2012, 2569-2581; and Jing Chen and Brian C. J. Moore, Effect of Individually Tailored Spectral Change Enhancement on Speech Intelligibility and Quality for Hearing-Impaired Listeners, Proceedings of ICASSP 2013, Vancouver, Canada, May 2013, each of which incorporated herein by reference).
Likewise, in high-end audio equipment that renders audio data, the potential to modify transient features like drumsticks hitting a drum is desired to meet different individual preferences in music listening. See U.S. Pat. No. 7,353,169, incorporated herein by reference. In U.S. Pat. No. 7,353,169, the spectral flux is used to determine frequency-specific indicators of transient features in high end audio equipment. According to these indicators, a modification of the corresponding transient features is applied to improve the impression of music. It is up to the user to decide on the amount, the frequency ranges, and the kind of modification (suppression or enhancement) he prefers.
Some methods aiming for separate reduction and enhancement of transients are provided below.
Transient Noise Reduction
In U.S. patent application Ser. No. 13/975,487, entitled “Reduction of Transient Sounds in Hearing Implants”, from Frühauf, filed Aug. 26, 2013 (incorporated herein by reference), the sound signal is transformed into K sub-signals and each of these signals corresponds to a certain frequency range. The envelopes of these sub-signals are considered and referred to as subband envelopes. One characteristic of a transient noise signal are envelopes having high values in each channel over a wide frequency range, where the lower frequency bound is above approx. 1 kHz. Channel specific indicators of a transient noise feature are calculated using the power of the input signal and the envelopes in the subbands. These indicators have high values if all the corresponding subband envelopes have high values relative to the power of the whole signal. High values of all indicators in the frequency range above approx. 1 kHz characterize a transient noise feature, while consonants or fricatives only have some indicators with high values. Thus the indicators of the frequency ranges above approx. 1 kHz are multiplied to get an indicator that has a large value for a transient noise feature.
Another characteristic of transient signals are a fast and steep rising envelope of the sound signal. Thus during the occurrence of a transient, the envelope has much larger values for a short time interval. In German Patent DE 102005043314, the steepness and/or the amplitude of the envelope of the sound signal are considered. If one or both of these values exceed certain thresholds, the sound signal is attenuated.
In European Patent EP 1371263 (incorporated herein by reference), the sound signal is transformed into K sub-signals in the frequency domain. Then, for each sub-signal, two or three sub-indices are calculated which are used to classify the present sound signal into the categories “stationary noise”, “quasi stationary noise”, “desired speech and music” and “transient noise”. These sub-indices refer to intensity changes during a given time interval, the modulation frequency, and the duration of very similar intensities of the signal, respectively. According to the classified category, a gain function is calculated, that is used to suppress transient sounds or to enhance the SNR in case of the classified categories “stationary noise” or “quasi stationary noise”.
In WO 99/53615 (incorporated herein by reference), a transient detector divides the input signal into at least two frequency bands. In each of these bands, the derivative and/or the amplitude of the envelope are compared to at least one threshold function to indicate a transient in the respective band. If a transient is detected in at least one band, the coefficients of an adaptive filter are changed in such a way that the transients in the input signal are reduced by filtering the delayed input signal with this determined adaptive filter. After the detector no longer detects a transient, the filter coefficients return to the values before the transient has appeared.
In U.S. Pat. No. 7,353,169, the spectral flux is used to determine frequency-specific indicators of transient features in high end audio equipment. According to these indicators, a modification of the corresponding transient features is applied to improve the impression of music. It is up to the user to decide on the amount, the frequency ranges, and the kind of modification (suppression or enhancement) he prefers.
U.S. Pat. No. 7,725,315 (incorporated herein by reference), describes using models of transient road noise based on a code book or a neural network to attenuate transient sounds.
U.S. Pat. No. 7,869,994 (incorporated herein by reference) describes an attenuation of certain wavelet coefficients based on a threshold to suppress transient sounds.
A possibility to reduce transient features in a cochlear implant system is to use hearing aid algorithms as proposed in U.S. 2005/0209657 (incorporated herein by reference).
In Stöbich B., Zierhofer C. M., Hochmair E. S., Influence of Automatic Gain Control Parameter Settings on Speech Understanding of Cochlear Implant Users Employing the Continuous Interleaved Sampling Strategy” Ear & Hearing, 1999, 20, 104-116 Stöbich 1999 (incorporated herein by reference), a dual front-end AGC is proposed to reduce transient features.
Transient Speech Enhancement
U.S. Pat. No. 7,219,065 (incorporated herein by reference) describes that a plurality of envelopes in the frequency channels of the sound signal are generated. Then, in each channel, changes of the envelope-intensities within a short time window (60 ms) are investigated to calculate a gain, which is used to enhance the envelope intensity in case a transient speech feature gets detected. For small variations or decreasing values of the intensities, the gain is set to one. The highest gain values (up to 14 dB) are achieved if the intensities have low, high and low values in the beginning (0-20 ms), in the middle (20-40 ms), and at the end (40-60 ms) of the time window, respectively. Furthermore, a small enhancement is used if there is an onset, i.e., small values of the envelopes in the beginning, followed by a high value in the middle and at the end of the time window.
Koning R., Wouters J., The Potential of Onset Enhancement for Increased Speech Intelligibility in Auditory Prostheses, J. Acoust. Soc. Am. Volume 132, Issue 4, pp. 2569-2581 (2012); (incorporated herein by reference) describes a sound signal separated into frequency bands, and the onsets of the corresponding envelopes are enhanced by adding peak envelope signals. Band-specific peak envelopes are the weighted rectified differences of the corresponding envelope and the weighted low-passed filtered envelope. Studies have shown that this enhancement of the onsets increases speech intelligibility.
Chen, J., Moore, B. C. J., Effect of Individually Tailored Spectral Change Enhancement on Speech Intelligibility and Quality for Hearing-impaired Listeners, Proceedings of ICASSP 2013, Vancouver, Canada, May 2013 (incorporated herein by reference) investigates the influence of enhancement of spectral changes for hearing impaired listeners. The input sound signal is transformed into spectral components by a short time Fourier transformation. Changes of these amplitudes are then enhanced and back-transformed to the time domain. These enhanced signals are evaluated by subjects with mild to moderate hearing loss. The study shows that the speech intelligibility increases while the sound quality remains nearly the same.