The present invention relates generally to an electro-acoustic processing circuit for increasing speech intelligibility. More specifically, this invention relates to an audio device having signal processing capabilities for amplifying selected voice frequency bands without circuit instability and oscillation thereby increasing speech intelligibility of persons with a sensory neural hearing disorder.
Persons with a sensory neural hearing disorder find the speech of others to be less intelligible in a variety of circumstances where those with normal hearing would find the same speech to be intelligible. Many persons with sensory neural hearing disorder find that they can satisfactorily increase the intelligibility of speech of others by cupping their auricle with their hand or using an ear trumpet directed into the external auditory canal.
Many patients with sensory neural hearing disorder have normal or near normal pure tone sensitivity to some of the speech frequencies below about 1000 Hz. These frequencies generally comprise the first speech formant. Associated with their sensory neural hearing disorder is many patient""s diminished absolute sensitivity for the pure tone frequencies that are higher than the first speech formant. This reduced sensitivity generally signifies a loss of perception of the second speech formant that occupies the voice spectrum between about 1000 Hz and 2800 Hz. Not only is the patient""s absolute sensitivity lost for the frequencies of the second formant but the normal loudness relationship between the frequencies of the first and second formants is altered, with those of the second formant being less loud at ordinary supra threshold speech levels of 40-60 phons. Thus when electro-acoustical hearing aids amplify both formants by an approximately equal amount at normal speech input levels, the loudness of the second formant relative to the first is lacking and voices sound unintelligible, muffled, and basso.
Patients with sensory neural hearing disorder often have difficulty following the spoken message of a given speaker in the presence of irrelevant speech or other sounds in the lower speech spectrum. They may hear constant or intermittent head sounds, tinnitus; they may have a reduced range of comfortable loudness, recruitment; they may hear a differently pitched sound from the same tone presented to each ear, diplacusis binuralis; or they may mishear what has been said to them.
It is well established that for those with normal hearing, the first and second speech formants which together occupy the audio frequency band of about 250 Hz to 2800 Hz, are both necessary and sufficient for satisfactory speech intelligibility of a spoken message. This is demonstrated in telephonic communication equipment, i.e. the EE8a field telephone, of WWII vintage, and by the development of the xe2x80x9cvocoderxe2x80x9d and its incorporation into voice encryption means of WWII (U.S. Pat. No. 3,967,067 to Potter and U.S. Pat. No. 3,967,066 to Mathes, as described by Kahn, IEEE Spectrum, September 1984, pp. 70-80).
The vocoding and encryption process analyzed the speech signal into a plurality of contiguous bands, each about 250-300 Hz wide. After rectification and digitization, and combination with a random digital code supplied for each band, the combined digitized signals were transmitted to a distant decoding and re-synthesizing system. This system first subtracted the random code using a recorded duplicate of the code. It then reconstituted the voice by separately modulating the output of each of the plurality of channels, that were supplied from a single xe2x80x9cbuzzxe2x80x9d source, rich in the harmonics of a variable frequency fundamental centered on 60 Hz (if the voice were that of a male).
At no point in this voice transmission was any of the original (analogue) speech signal transmitted. The resynthesis of the speech signal was accomplished with a non-vocally produced fundamental frequency and its harmonics, that was used to produce voiced sounds. The unvoiced speech sounds were derived from an appropriately supplied xe2x80x9chissxe2x80x9d source, also modulated and used to produce the voice fricative sounds. Because of the limitations imposed by the number of channels and their widths, the synthesized voice contained information (frequencies) from the first and second reconstituted speech formants. Although sounding robot-like, to those with normal hearing, the reconstituted speech was entirely intelligible and because there was no transmitted analogue signal could be used with perfect security.
It is also important to note that the content of each of the plurality of bands that make up vocoder speech are derived from the same harmonic rich buzz source. Thus the harmonic matrix forms the basis of an intercorrelated system of voice sounds throughout the speech range which comprise the first and second formants. Intelligibility depends therefore, among other things, upon maintaining the integrity of the first and second speech formants in appropriate loudness relationship to one and the other. These relationships were preserved in the encrypted vocoding process and in the subsequent resynthesizing process.
The diminished capability to decipher the speech of others is the principle reason that sensory-neural patients seek hearing assistance. Prior to the development of electro-acoustical hearing aides, hearing assistance was obtained largely by an extension of the auricle either with a xe2x80x9clouder pleasexe2x80x9d gesture (ear cupping) or an ear trumpet. Both of these means are effective for many sensory-neural patients but have the disadvantage that they are highly conspicuous and not readily acceptable, as means of assistance, to the patients who can be aided by them. Modern electro-acoustical hearing aids, in contrast, are much less conspicuous but bring with them undesirable features, which make them objectionable to many patients.
The results of modern hearing aid speech signal processing differ greatly from the horn-like acoustical processing characteristics provided by either the passive device of an ear trumpet or a hand used for ear cupping. Especially for the frequencies of the second speech formant, the latter provide significant acoustic gain in the form of enhanced impedance matching between the air medium outside the ear and the outer ear canal. The passive devices moreover provide less gain for the first speech formant frequencies and do not create intrinsic extraneous hearing aid-generated sounds in the signals that are passed to the patient""s eardrum. They also provide a signal absent of ringing and of oscillation or the tendency to oscillate at audible frequencies, which is usually at about 2900 Hz and called xe2x80x9chowlxe2x80x9d or xe2x80x9cwhistlexe2x80x9d in the prior art. Moreover, passive devices, being intrinsically linear, in an amplitude sense, convey their signals without extraneous intermodulation products. As stable systems, passive devices have excellent transient response characteristics, are free of the tendency to ring, have stable acoustic gain, and have stable bandwidth characteristics.
An electro-acoustic hearing aid, in contrast, consists basically of a microphone, an earphone or loud speaker and an electronic amplifier between the two which are all connected together in one portable unit. Such electro-acoustical aids inevitably provide a short air path between the microphone and the earphone or loudspeaker, whether or not the two are housed in a single casing. If the unit is an in-the-ear type electro-acoustic hearing aid, there is almost inevitably provided a narrow vent channel or passageway through which the output of the earphone or loudspeaker may pass to the input microphone. This passageway provides a second pathway for the voice of the person speaking to the aid wearer whereby audio signals traveling in this passageway reaches the patient""s auditory system (eardrum) unmodified by the aid.
Significant acoustic coupling between the microphone and the earphone render the entire electronic system marginally stable with the potential for regenerative feedback. Regenerative (or positive) feedback occurs when the instantaneous time variation in the amplitude of the output of the system is in-phase with the input signal. The gain of such a marginally stable system increases greatly while the passband of the system typically narrows in inverse proportion to the increase of the system""s gain. When the loop gain exceeds unity the system will oscillate and if the oscillatory frequency is audible, and within the range of the patient""s hearing capability, the resulting tone forms an objectionable sound, called a xe2x80x9chowlxe2x80x9d that tends to mask the speech signals coming from the hearing aid or through the passageway from without.
In U.S. Pat. No. 5,003,606 to Bordenwijk and U.S. Pat. No. 5,033,090 to Weinrich, an attempt is made to cancel the positive feedback by the use of the signal from a second microphone sensitive to sounds originating from sources near to the first microphone and then to feed the output of this second microphone into the signal amplifier in counterphase to the input from the first microphone. Although this means allows for some greater gain in a hearing aid so configured, it does not entirely eliminate marginal stability under all conditions, nor the howling, owing to positive feedback. The major drawback of these means is the inability of such systems to discriminate between a near signal generated by a signal source of interest and the signal deriving from the earphone. Bordenwijk finds it necessary to introduce the inconvenience of a separate control to adapt the aid for listening to nearby signals of interest. One disadvantage of Weinrich""s in-the-ear system, which locates the near microphone in the vent tube, is that the diameter of this tube is generally narrow. Such narrowing may limit the amplitude of the signals that are fed in counterphase to the amplifier. If narrow enough, this negatively affects the quality of the sound heard by the patient directly through the vent.
U.S. Pat. No. 5,347,584 to Narisawa attempts to eliminate acoustical regeneration by a tight fitting means that effectively seals the in-the-ear earphone earmold of the hearing aid to the walls of the outer ear canal near the tympanic membrane. However, this means poses a potential threat to the integrity of the tympanic membrane itself from changes in the external barometric pressure and establishes an unhygienic condition owing to lack of air circulation in the enclosed space if worn for an extended period. For some wearers the unremitting pressure on the internal surfaces of the external ear canal may also predispose to the development of itching, excessive ceruminocumulation and pressure sores. Moreover this approach to the elimination of positive feedback makes the wearer completely at the mercy of the hearing aid for the detection of any external sounds and makes the heard sound unnatural. Thus, if either the hearing aid or its power supply fails, that ear of the wearer is completely cut off from the outside audible world making the patient""s residual hearing useless no matter how much of it there remains for that ear. Further, although this system blocks all air conducting positive feedback sounds, the possibility of positive feedback through the casing of the hearing instrument itself and through the tissues of the head, remain problematic at higher gains.
Critical information for the person with normal hearing is contained in the bands of the first and second formants and there is thought to be especially critical information in specific regions of the latter, namely the higher frequencies of the first formant and the lower frequencies of the second formant. These contain the frequencies which comprises the voiced consonant sounds (named formant transitions in voice spectrography).
In U.S. Pat. No. 4,051,331 to Strong and Palmer it is proposed to xe2x80x9cmovexe2x80x9d this information by transposition into the region of the voice spectrum where some severely hearing impaired sensory-neural patients have spared sensitivity. For example, if for a given speaker the voiced, unvoiced and mixed speech sounds are centered about a frequency f(t), the speech signal processor of a Strong et al. hearing aid transposes this information such that it will be centered about F(o) where F(o) less than f(t) and lies within first formant range where the sparing resides. This system is proposed and may be useful for the most profoundly impaired sensory-neural patient. Such recentering does not provide a natural sounding voice and leaves such patients much more at risk for the degradation of intelligibility that occurs from the masking of other voice sounds by extraneous noises. These are usually the lower frequencies found in the first speech formant. The majority of patients with lesser sensory neural hearing deficits do not require such a system as taught by Strong et al. For them, speech intelligibility can be dealt with satisfactorily with the limited gain offered by ear cupping or an ear trumpet, thereby sustaining no loss from masking effects and no loss of voice fidelity. Thus, the Strong et al. invention offers no advantage to these patients and provides some disadvantages.
It is a common observation that patients with sensory neural hearing deficits are hampered by their inability to extract intelligible speech in a so-called noisy environment due to the effect that lower speech frequencies mask the higher frequencies of the second formant such as those required for speech intelligibility. This disability from ambient noise occurs in those with normal hearing as well but not to the extent experienced by persons with sensory neural hearing deficits. The so-called noise may be of a vocal or non vocal origin but is usually composed of sounds within the spectral range of the first formant. Prior art to deal with this problem includes, for example, directional hearing aid microphones and binaurally fitted hearing aids (See Mueller and Hawkins, Handbook for Hearing Aid Amplification, Chapter 2, Vol. II, 1990).
U.S. Pat. No. 5,285,502 to Walton et al. attempts to deal with the noise and compensation problems concurrently by dividing the speech signal with a variable high and a low pass filter. This approach varies the attenuation of the lower frequencies of the first voice formant by moving the cutoff slope characteristic of the high pass filter to higher or lower frequencies. When the noise level is low, the cutoff moves toward the lower frequencies permitting whole voice spectrum listening because the system passes more of the lower frequencies of the first formant. As the noise level builds, a level detector output shifts the low frequency slope of the variable high pass filter toward higher frequencies. As this occurs the overall gain of the system for the first formant frequencies that contains the noise declines. However, the lower end of the highpass filter response characteristic remains below the formant transition zone so that this important region that contains the information from which differential consonant and vowel sounds emerge, is always conveyed to the patient. In this way, Walton only attenuates the lower frequencies and maintains the higher frequencies (i.e. the second speech formant frequencies) at a constant amplification.
U.S. Pat. No. 5,303,306 to Brillhart et al. teaches a programmable system that switches from one combination of bandpass, gain, and roll off conditions to another as the wearer selects desired preprogrammed characteristics. This patent teaches a dual band system that has a plurality of programmed or programmable acoustical characteristic that conform to the patient""s respective audiogram, loudness discomfort level and most comfortable loudness level. These devices are generally complex, and inconvenient to use because they must be programmed with a separate remote controller unit which must be directed to the ear unit. Furthermore, they are expensive and do not eliminate regeneration and all its attendant problems brought on by marginal stability. Additionally, they may not have a manually operated on and off switch that users find most congenial and convenient. Most importantly they do not perform as well as an ear trumpet and do not permit a patient to hear under demanding circumstances as when a podium speaker is to be heard from the rear of a noisy auditorium.
Ear cupping and the ear trumpet on the other hand, by restoring the acoustical balance between the first and second formants with a system that does not regenerate, deal with the detrimental effects of noise on speech intelligibility in an entirely different and more efficient manner. These passive devices provide differential gains for the first and second speech formant frequencies. The electro-acoustical devices and methods of the prior art are each subject to its own drawback. The devices and methods either have marginal stability and are subject to changing gain, howl (regeneration) and uncertain band width or they fail to make best use of the patient""s residual hearing thus failing to restore both intelligibility and to preserve the patient""s ability to retrieve speech in a noisy environment.
These and other types of devices and methods disclosed in the prior art do not offer the flexibility and inventive features of our signal processing circuit and method for increasing speech intelligibility. As will be described in greater detail hereinafter, the circuit and method of the present invention differ from those previously proposed. For example, the present invention actively monitors the acoustic environment in which it operates.
According to the present invention we have provided a signal processing circuit for increasing speech intelligibility comprising a receiving circuit for receiving an audio signal detectable by a human. A gain amplifying circuit generally amplifies the gain of the audio signal. A shaping filter modifies the audio signal wherein the modified audio signal is made to be in phase with a second audio signal present at the receiving circuit and which is detected by the human unprocessed by the signal processing circuit. Further, the shaping filter also differentially amplifies first and second speech formant frequencies of the audio signal as a function dependent on a frequency of the audio signal. A feedback circuit is provided for controlling the gain amplification in said gain amplifying circuit and wherein the signal processing circuit substantially prevents regenerative oscillation of the amplified audio signal.
A feature of the invention relates to a method of processing an audio signal for increasing speech intelligibility to a human. One embodiment of our method comprises the steps of receiving an audio signal; modifying the audio signal to be in phase with a second audio signal present at the receiving circuit and which is detectable by the human and unprocessed by the signal processing circuit; amplifying frequencies of the audio signal differentially wherein substantially only second speech formant frequencies of said audio signal have varied amplified gain; and controlling the gain amplification wherein the signal processing circuit substantially prevents regenerative oscillation of the amplified audio signal.
Still another feature of the invention concerns a signal injection circuit for injecting a signal tone to mix with said audio signal and wherein the feedback circuit further comprises a gain control circuit for automatically controlling the gain amplifying circuit as a function of the sensed level of the injected signal tone.
According to important features of the invention we have also provided the feedback circuit further comprising a processing filter for providing a negative feedback to the gain amplifying circuit as a function of change in environmental variables.
In accordance with the following, it is an advantage of the present invention to provide a signal processing circuit that reduces regenerative feedback, that emulates the acoustical characteristics of ear cupping or an ear trumpet and that has usable gain characteristics superior to these passive devices.
A further advantage is to provide a processing circuit that provides a wearer the capability to adjust the amplification of the overall gain as well as specific differential amplification of first and second speech formants in relation to a specific roll-off frequency.
Yet a further advantage is to provide a portable electro-acoustic hearing aid for sensory neural patients, wherein the aid has one or more of the above signal processing circuit characteristic advantages.
Another advantage is to provide an electroacoustic hearing aid that responds to the limitation that amplification of the higher frequency sounds (second formant) is marginal at best in conventional hearing aids and that the desired amount of amplification is often the maximum allowable, subject to the constraint that regenerative howling not occur.
Still another advantage is to provide an electro-acoustic hearing aid that contains a vent or passageway to permit an unprocessed and processed signal to be in phase with one and the other throughout the spectral limits of the first and second formants once they reach the tympanic membrane (eardrum) of a hearing aid wearer.