Numerous frequency-transposition schemes for the presentation of audio signals via hearing devices for people with a hearing impairment have been developed and evaluated over many years. In each case, the principal aim of the transposition is to improve the audibility and discriminability of signals in a particular frequency range by modifying those signals and presenting them at other frequencies. Usually, high frequencies are transposed to lower frequencies where hearing device users typically have better hearing ability. However, various problems have limited the successful application of such techniques in the past. These problems include technological limitations, distortions introduced into the sound signals by the processing schemes employed, and the absence of methods for identifying suitable candidates and for fitting frequency-transposing hearing aids to them using appropriate objective rules.
The many techniques for frequency transposition reported previously can be subdivided into three broad types: frequency shifting, frequency compression, and reducing the playback speed of recorded audio signals while discarding portions of the signal in order to preserve the original duration.
Among frequency compression schemes, many linear and non-linear techniques including FFT/IFFT processing, vocoding, and high-frequency envelope transposition followed by mixing with unmodified low-frequency components have been investigated. Since harmonic patterns and formant relations are known to be important in the accurate perception of speech, it is also helpful to distinguish spectrum-preserving techniques from spectrum-destroying techniques. Each of these techniques is summarized briefly below.
At present, the only frequency-transposing hearing instruments available commercially are those manufactured by AVR Ltd., a company based in Israel and Minnesota, USA (see http://www.avrsono.com). An instrument produced previously by AVR, known as the TranSonic, has been superseded recently by the ImpaCt and Logicom-20 devices. All of these frequency-transposition instruments are based on the selective reduction of the playback speed of recorded audio signals. This is achieved by first sampling the input sound signal at a particular rate, and then storing it in a memory. When the recorded signal is subsequently read out of the memory, the sampling rate is reduced when frequency-lowering is required. Because the sampling rate can be changed, it is possible to apply frequency lowering selectively. For example, different amounts of frequency-lowering can be applied to voiced and unvoiced speech components. The presence of each type of component in the input signal is determined by estimating the spectral shape; the signal is assumed to be unvoiced when a spectral peak is detected at frequencies above 2.5 kHz, voiced otherwise. In order to maintain the original duration of the signals, parts of the sampled data in the memory are discarded when necessary. U.S. Pat. No. 5,014,319 assigned to AVR describes not only the compression of input frequencies (i.e. frequencies are transposed into lower ranges) but also frequency expansion (i.e. transposition into higher frequency ranges). Other similar methods of frequency transposition by means of reducing the playback speed of recorded audio signals have also been reported previously (e.g. FR-2 364 520, DE-17 62 185). As mentioned, a major problem with any of these schemes is that portions of the input signal must be discarded when the playback speed is reduced (to compress frequencies) in order to maintain the original signal duration, which is essential in a real-time assistive listening system such as a hearing device. This could result in audible distortions in the output signal and in some important sound information being inaudible to the hearing device user.
Linear frequency compression by means of Fourier Transform processing has been investigated by Turner and Hurtig at the University of Iowa, USA (Turner, C. W. and R. R. Hurtig: “Proportional Frequency Compression of Speech for Listeners with Sensorineural Hearing Loss”, Journal of the Acoustical Society of America, vol. 106(2), pp. 877–886, 1999), and has led to an international patent application having the publication number WO 99/14 986. This real-time algorithm is based on the Fast Fourier Transform (FFT). Input signals are converted into the frequency domain by an FFT having a relatively large number of frequency bins resulting in a high frequency resolution which is absolutely necessary to achieve a good sound quality with a system based on linear frequency compression. To achieve frequency lowering, the reported algorithm multiplies each frequency bin by a constant factor (less than 1) to produce the desired output signal in the frequency domain. Data loss resulting from this compression of the spectrum is minimized by linear interpolation across frequencies. The output signal is then converted back into the time domain by means of an inverse FFT (IFFT). One disadvantage of this technique is that it is very inefficient computationally due to the large size of the FFT, and would consume too much electrical energy if implemented in a hearing device. Furthermore, propagation delay of signals processed by this algorithm would be unacceptably long for hearing device users, potentially resulting in some interference with their lip-reading ability. In addition, the compression capabilities (i.e. the range of the compression ratio) are limited due to the applied proportional, i.e. linear, compression scheme.
A feature extraction and signal resynthesis procedure and system based on a vocoder have been described by Thomson CSF, Paris in EP-1 006 511. Information about pitch, voicing, energy, and spectral shape is extracted from the input signal. These features are modified (e.g. by compressing the formant. frequencies in the frequency domain) and then used for synthesis of the output signal by means of-a vocoder (i.e. a relatively efficient electronic or computational device or technique for synthesizing speech signals). A very similar approach has also been described by Strong and Palmer in U.S. Pat. No. 4,051,331. Their signal synthesis is also based on modified speech features. However, it synthesizes voiced components using tones, and unvoiced components using narrow-band noises. Thus, these techniques are spectrum-destroying rather than spectrum-preserving.
A phase vocoder system for frequency transposition is described in a paper by H. J. McDermott and M. R. Dean (“Speech perception with steeply sloping hearing loss”, British Journal of Audiology, vol. 34, pp. 353–361, December 2000). A non-real-time implementation is disclosed using a computer program. Digitally recorded speech signals were low pass filtered, down sampled and windowed, and then processed by a FFT. The phase values from successive FFTs were used to estimate a more precise frequency for each FFT bin, which was used to tune an oscillator corresponding to each FFT bin. Frequency lowering was achieved by multiplying the frequency estimates for each FFT-bin by a constant factor.
Another system that can separately compress the frequency range of voiced and unvoiced speech components as well as the fundamental frequency has been described by S. Sakamoto, K. Goto, et. al. (“Frequency Compression Hearing Aid for Severe-To-Profound Hearing Impairments”, Auris Nasus Larynx, vol. 27, pp. 327–334, 2000). This system allows independent adjustment of the frequency compression ratio for unvoiced and voiced speech, fundamental frequency, the spectral envelope, and the instrument's frequency response by the selection of different filters. The compression ratio for either voiced or unvoiced speech is adjustable from 10% to 90% in steps of 10%. The fundamental frequency can either be left unmodified, or compressed with a compression ratio either the same as, or lower than, that employed for voiced speech. A problem with each of the above feature-extraction and resynthesis processing schemes is that it is technically extremely difficult to obtain reliable estimates of speech features (such as fundamental frequency and voicing) in a wearable, real-time hearing instrument, especially in unfavorable listening conditions such as when noise or reverberation is present.
EP-0 054 450 describes the transposition and amplification of two or three different bands of the frequency spectrum into lower-frequency bands within the audible range. In this scheme, the number of “image” bands equals the number of original bands. The frequency compression ratio can be different across bands, but is constant within each band. The image bands are arranged contiguously, and transposed to frequencies above 500 Hz. In order to free this part of the spectrum for the image bands, the amplification for frequencies between 500 and 1000 Hz decreases gradually with increasing frequency. Frequencies below 500 Hz in the original signal are amplified with a constant gain.
In U.S. Pat. No. 4,419,544 to Adelman, the input signal is subjected to adaptive noise canceling before filtering into at least two pass-bands takes place. Frequency compression is then carried out in at least one frequency band.
Other techniques described previously include the modulation of tones or noise bands in the low-frequency range based on the energy present in higher frequencies (e.g. FR-1 309 425, U.S. Pat. No. 3,385,937), and various types of linear and non-linear transposition of high-frequency components which are then superimposed onto the low-frequency part of the spectrum (e.g. U.S. Pat. No. 5,077,800 and U.S. Pat. No. 3,819,875). Another approach (WO 00/75 920) describes the superposition of the original input signal with several frequency-compressed and frequency-expanded versions of the same signal to generate an output signal containing several different pitches, which is claimed to improve the perception of sounds by hearing-impaired listeners.
Problems with each of the above described methods for frequency transposition include technical complexity, distortion or loss of information about sounds in some circumstances, and unreliability of the processing in difficult listening conditions, e.g. in the presence of background noise.