The present invention relates to improvements in voice amplification and clarification in a noisy environment, such as a cabin communication system, which enables a voice spoken within the cabin to be increased in volume for improved understanding while minimizing any unwanted noise amplification. The present invention also relates to a movable cabin that advantageously includes such a cabin communication system for this purpose. In this regard, the term xe2x80x9cmovable cabinxe2x80x9d is intended to be embodied by a car, truck or any other wheeled vehicle, an airplane or helicopter, a boat, a railroad car and indeed any other enclosed space that is movable and wherein a spoken voice may need to be amplified or clarified.
As anyone who has ridden in a mini-van, sedan or sport utility vehicle will know, communication among the passengers in the cabin of such a vehicle is difficult. For example, in such a vehicle, it is frequently difficult for words spoken by, for example, a passenger in a back seat to be heard and understood by the driver, or vice versa, due to the large amount of ambient noise caused by the motor, the wind, other vehicles, stationary structures passed by etc., some of which noise is caused by the movement of the cabin and some of which occurs even when the cabin is stationary, and due to the cabin acoustics which may undesirably amplify or damp out different sounds. Even in relatively quiet vehicles, communication between passengers is a problem due to the distance between passengers and the intentional use of sound-absorbing materials to quiet the cabin interior. The communication problem may be compounded by the simultaneous use of high-fidelity stereo systems for entertainment.
To amplify the spoken voice, it may be picked up by a microphone and played back by a loudspeaker. However, if the spoken voice is simply picked up and played back, there will be a positive feedback loop that results from the output of the loudspeaker being picked up again by the microphone and added to the spoken voice to be once again output at the loudspeaker. When the output of the loudspeaker is substantially picked up by a microphone, the loudspeaker and the microphone are said to be acoustically coupled. To avoid an echo due to the reproduced voice itself, an echo cancellation apparatus, such as an acoustic echo cancellation apparatus, can be coupled between the microphone and the loudspeaker to remove the portion of the picked-up signal corresponding to the voice component output by the loudspeaker. This is possible because the audio signal at the microphone corresponding to the original spoken voice is theoretically highly correlated to the audio signal at the microphone corresponding to the reproduced voice component in the output of the loudspeaker. One advantageous example of such an acoustic echo cancellation apparatus is described in commonly-assigned U.S. patent application Ser. No. 08/868,212. Another advantageous acoustic echo cancellation apparatus is described hereinbelow.
On the other hand, any reproduced noise components may not be so highly correlated and need to be removed by other means. However, while systems for noise reduction generally are well known, enhancing speech intelligibility in a noisy cabin environment poses a challenging problem due to constraints peculiar to this environment. It has been determined in developing the present invention that the challenges arise principally, though not exclusively, from the following five causes. First, the speech and noise occupy the same bandwidth, and therefore cannot be separated by band-limited filters. Second, different people speak differently, and therefore it is harder to properly identify the speech components in the mixed signal. Third, the noise characteristics vary rapidly and unpredictably, due to the changing sources of noise as the vehicle moves. Fourth, the speech signal is not stationary, and therefore constant adaptation to its characteristics is required. Fifth, there are psycho-acoustic limits on speech quality, as will be discussed further below.
One prior art approach to speech intelligibility enhancement is filtering. As noted above, since speech and noise occupy the same bandwidth, simple band-limited filtering will not suffice. That is, the overlap of speech and noise in the same frequency band means that filtering based on frequency separation will not work. Instead, filtering may be based on the relative orthogonality between speech and noise waveforms. However, the highly non-stationary nature of speech necessitates adaptation to continuously estimate a filter to subtract the noise. The filter will also depend on the noise characteristics, which in this environment are time-varying on a slower scale than speech and depend on such factors as vehicle speed, road surface and weather.
FIG. 1 is a simplified block diagram of a conventional cabin communication system (CCS) 100 using only a microphone 102 and a loudspeaker 104. As shown in the figure, an echo canceller 106 and a conventional speech enhancement filter (SEF) 108 are connected between the microphone 102 and loudspeaker 104. A summer 110 subtracts the output of the echo canceller 106 from the input of the microphone 102, and the result is input to the SEF 108 and used as a control signal therefor. The output of the SEF 108, which is the output of the loudspeaker 26, is the input to the echo canceller 106. In the echo canceller 106, on-line identification of the transfer function of the acoustic path (including the loudspeaker 104 and the microphone 102) is performed, and the signal contribution from the acoustic path is subtracted.
In a conventional acoustic echo and noise cancellation system, the two problems of removing echoes and removing noise are addressed separately and the loss in performance resulting from coupling of the adaptive SEF and the adaptive echo canceller is usually insignificant. This is because speech and noise are correlated only over a relatively short period of time. Therefore, the signal coming out of the loudspeaker can be made to be uncorrelated from the signal received directly at the microphone by adding adequate delay into the SEF. This ensures robust identification of the echo canceller and in this way the problems can be completely decoupled. The delay does not pose a problem in large enclosures, public address systems and telecommunication systems such as automobile hands-free telephones. However, it has been recognized in developing the present invention that the acoustics of relatively smaller movable cabins dictate that processing be completed in a relatively short time to prevent the perception of an echo from direct and reproduced paths. In other words, the reproduced voice output from the loudspeaker should be heard by the listener at substantially the same time as the original voice from the speaker is heard. In particular, in the cabin of a moving vehicle, the acoustic paths are such that an addition of delay beyond approximately 20 ms will sound like an echo, with one version coming from the direct path and another from the loudspeaker. This puts a limit on the total processing time, which means a limit both on the amount of delay and on the length of the signal that can be processed.
Thus, conventional adaptive filtering applied to a cabin communication system may reduce voice quality by introducing distortion or by creating artifacts such as tones or echoes. If the echo cancellation process is coupled with the speech extraction filter, it becomes difficult to accurately estimate the acoustic transfer functions, and this in turn leads to poor estimates of noise spectrum and consequently poor speech intelligibility at the loudspeaker. An advantageous approach to overcoming this problem is disclosed below, as are the structure and operation of an advantageous adaptive SEF.
Several adaptive filters are known for use in the task of speech intelligibility enhancement. These filters can be broadly classified into two main categories: (1) filters based on a Wiener filtering approach and (2) filters based on the method of spectral subtraction. Two other approaches, i.e. Kalman filtering and H-infinity filtering, have also been tried, but will not be discussed further herein.
Spectral subtraction has been subjected to rigorous analysis, and it is well known, at least as it currently stands, not to be suitable for low SNR (signal-to-noise) environments because it results in xe2x80x9cmusical tonexe2x80x9d artifacts and in unacceptable degradation in speech quality. The movable cabin in which the present invention is intended to be used is just such a low SNR environment.
Accordingly, the present invention is an improvement on Wiener filtering, which has been widely applied for speech enhancement in noisy environments. The Wiener filtering technique is statistical in nature, i.e. it constructs the optimal linear estimator (in the sense of minimizing the expected squared error) of an unknown desired stationary signal, n, from a noisy observation, y, which is also stationary. The optimal linear estimator is in the form of a convolution operator in the time domain, which is readily converted to a multiplication in the frequency domain. In the context of a noisy speech signal, the Wiener filter can be applied to estimate noise, and then the resulting estimate can be subtracted from the noisy speech to give an estimate for the speech signal.
To be concrete, let y be the noisy speech signal and let the noise be n. Then Wiener filtering requires the solution, h, to the following Wiener-Hopf equation:                                           R            ny                    ⁡                      (            t            )                          =                              ∑                          s              =                              -                ∞                                      ∞                    ⁢                      xe2x80x83                    ⁢                                    h              ⁡                              (                s                )                                      ⁢                                          R                yy                            ⁡                              (                                  t                  -                  s                                )                                                                        (        1        )            
Here, Rny is the cross-correlation matrix of the noise-only signal with the noisy speech, Ryy is the auto-correlation matrix of the noisy speech, and h is the Wiener filter.
Although this approach is mathematically correct, it is not immediately amenable to implementation. First, since speech and noise are uncorrelated, the cross-correlation between n and y, i.e. Rny, is the same as the auto-correlation of the noise, Rnn. Second, both noise and speech are non-stationary, and therefore the infinite-length cross-correlation of the solution of Equation 1 is not useful. Obviously, infinite data is not available, and furthermore the time constraint of echo avoidance applies. Therefore, the following truncated equation is solved instead:                                           R            nn                    ⁡                      (            t            )                          =                              ∑                          s              =                              1                -                m                                      m                    ⁢                      xe2x80x83                    ⁢                                    h              ⁡                              (                s                )                                      ⁢                                          R                yy                            ⁡                              (                                  t                  -                  s                                )                                                                        (        2        )            
Here, m is the length of the data window.
This equation can be readily solved in the frequency domain by taking Fourier Transforms, as follows:
Snn(f)=H(f)Syy(f)
Here, Snn and Syy are the Fourier Transforms, or equivalently the power spectral densities (PSDs), of the noise and the noisy speech signal, respectively. The auto-correlation of the noise can only be estimated, since there is no noise-only signal.
However, there are problems in this approach, which holds only in an approximate sense. First, the statistics of noise have to be continuously updated. Second, this approach fails to take into account the psycho-acoustics of the human ear, which is extremely sensitive to processing artifacts at even extremely low decibel levels. Neither does this approach take into account the anti-causal nature of speech or the relative stationarity of the noise. While several existing Wiener filtering techniques make use of ad hoc, non-linear processing of the Wiener filter coefficients in the hope of maintaining and improving speech intelligibility, these techniques do not work well and do not effectively address the practical problem of interfacing a Wiener filtering technique with the psycho-acoustics of speech.
As noted above, another aspect of the present invention is directed to the structure and operation of an advantageous adaptive acoustic echo canceller (AEC) for use with an SEF as disclosed herein. Of course, other adaptive SEFs may be used in the present invention provided they cooperate with the advantageous echo canceller in the manner disclosed below.
To realistically design a cabin communication system (CCS) that is appropriate for a relatively small, movable cabin, it has been recognized that the echo cancellation has to be adaptive because the acoustics of a cabin change due to temperature, humidity and passenger movement. It has also been recognized that noise characteristics are also time varying depending on several factors such as road and wind conditions, and therefore the SEF also has to continuously adapt to the changing conditions. A CCS couples the echo cancellation process with the SEF. The present invention is different from the prior art in addressing the coupled on-line identification and control problem in a closed loop.
There are other aspects of the present invention that contribute to the improved functioning of the CCS. One such aspect relates to an improved AGC in accordance with the present invention controls amplification volume and related functions in the CCS, including the generation of appropriate gain control signals for overall gain and a dither gain and the prevention of amplification of undesirable transient signals.
It is well known that it is necessary for customer comfort, convenience and safety to control the volume of amplification of certain audio signals in audio communication systems such as the CCS. Such volume control should have an automatic component, although a user""s manual control component is also desirable. The prior art recognizes that any microphone in a cabin will detect not only the ambient noise, but also sounds purposefully introduced into the cabin. Such sounds include, for example, sounds from the entertainment system (radio, CD player or even movie soundtracks) and passengers"" speech. These sounds interfere with the microphone""s receiving just a noise signal for accurate noise estimation.
Prior art AGC systems failed to deal with these additional sounds adequately. In particular, prior art AGC systems would either ignore these sounds or attempt to compensate for the sounds. In contrast, the present invention provides an advantageous way to supply a noise signal to be used by the AGC system that has had these additional noises eliminated therefrom.
A further aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS. In particular, while the CCS is intended to incorporate sufficient automatic control to operate satisfactorily once the initial settings are made, it is of course desirable to incorporate various manual controls to be operated by the driver and passengers to customize its operation. In this aspect of the present invention, the user interface enables customized use of the plural microphones and loudspeakers.
Accordingly, it is an object of the invention to provide an adaptive speech extraction filter (SEF) that avoids the problems of the prior art.
It is another object of the invention to provide an adaptive SEF that interfaces Wiener filtering techniques with the psycho-acoustics of speech.
It is yet another object of the invention to provide an adaptive SEF that is advantageously used in a cabin communication system of a moving vehicle.
It is a further object of the invention to provide a cabin communication system incorporating an advantageous adaptive SEF for enhancing speech intelligibility in a moving vehicle.
It is yet a further object of the invention to provide a moving vehicle including a cabin communication system incorporating an advantageous adaptive SEF for enhancing speech intelligibility in the moving vehicle.
It is still a further object of the invention to provide a cabin communication system with an adaptive SEF that increases intelligibility and ease of passenger communication with little or no increase in ambient noise.
It is even a further object of the present invention to provide a cabin communication system with an adaptive SEF that provide acceptable psychoacoustics, ensures passenger comfort by not amplifying transient sounds and does not interfere with audio entertainment systems.
It is also an object of the invention to provide an adaptive AEC that avoids the problems of the prior art.
It is another object of the invention to provide an adaptive AEC that interfaces with adaptive Wiener filtering techniques.
It is yet another object of the invention to provide an adaptive AEC that is advantageously used in a cabin communication system of a moving vehicle.
It is a further object of the invention to provide a cabin communication system incorporating an advantageous adaptive AEC for enhancing speech intelligibility in a moving vehicle.
It is yet a further object of the invention to provide a moving vehicle including a cabin communication system incorporating an advantageous adaptive AEC for enhancing speech intelligibility in the moving vehicle.
It is still a further object of the invention to provide a cabin communication system with an adaptive AEC that increases intelligibility and ease of passenger communication with little or no increase in ambient noise or echoes.
It is even a further object of the present invention to provide a cabin communication system with an adaptive AEC that does not interfere with audio entertainment systems.
It is also an object of the present invention to provide an automatic gain control that avoids the difficulties of the prior art.
It is another object of the present invention to provide an automatic gain control that provides both an overall gain control signal and a dither control signal.
It is yet another object of the present invention to provide an automatic gain control that precludes the amplification or reproduction of undesirable transient sounds.
It is also an object of the present invention to provide a user interface that facilitates the customized use of the inventive cabin communication system.
In accordance with these objects, one aspect of the present invention is directed to a cabin communication system for improving clarity of a voice spoken within an interior cabin having ambient noise, the cabin communication system comprising a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into an audio signal, the audio signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a speech enhancement filter for removing the second component from the audio signal to provide a filtered audio signal, the speech enhancement filter removing the second component by processing the audio signal by a method taking into account elements of psycho-acoustics of a human ear, and a loudspeaker for outputting a clarified voice in response to the filtered audio signal.
Another aspect of the present invention is directed to a cabin communication system for improving clarity of a voice spoken within an interior cabin having ambient noise, the cabin communication system comprising an adaptive speech enhancement filter for receiving an audio signal that includes a first component indicative of the spoken voice, a second component indicative of a feedback echo of the spoken voice and a third component indicative of the ambient noise, the speech enhancement filter filtering the audio signal by removing the third component to provide a filtered audio signal, the speech enhancement filter adapting to the audio signal at a first adaptation rate, and an adaptive acoustic echo cancellation system for receiving the filtered audio signal and removing the second component in the filtered audio signal to provide an echo-cancelled audio signal, the echo cancellation signal adapting to the filtered audio signal at a second adaption rate, wherein the first adaptation rate and the second adaptation rate are different from each other so that the speech enhancement filter does not adapt in response to operation of the echo-cancellation system and the echo-cancellation system does not adapt in response to operation of the speech enhancement filter.
Another aspect of the present invention is directed to an automatic gain control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into a first audio signal having a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a filter for removing the second component from the first audio signal to provide a filtered audio signal, an acoustic echo canceller for receiving the filtered audio signal in accordance with a supplied dither signal and providing an echo-cancelled audio signal, a control signal generating circuit for generating a first automatic gain control signal in response to a noise signal that corresponds to a current speed of the cabin, the first automatic gain control signal controlling a first gain of the dither signal supplied to the filter, the control signal generating circuit also for generating a second automatic gain control signal in response to the noise signal, and a loudspeaker for outputting a reproduced voice in response to the echo-cancelled audio signal with a second gain controlled by the second automatic gain control signal.
Another aspect of the present invention is directed to an automatic gain control for a cabin communication system for improving clarity of a voice spoken within a movable interior cabin having ambient noise, the ambient noise intermittently including an undesirable transient noise, the automatic gain control comprising a microphone for receiving the spoken voice and the ambient noise and for converting the spoken voice and the ambient noise into a first audio signal, the first audio signal including a first component corresponding to the spoken voice and a second component corresponding to the ambient noise, a parameter estimation processor for receiving the first audio signal and for determining parameters for deciding whether or not the second component corresponds to an undesirable transient noise, decision logic for deciding, based on the parameters, whether or not the second component corresponds to an undesirable transient signal, a filter for filtering the first audio signal to provide a filtered audio signal, a loudspeaker for outputting a reproduced voice in response to the filtered audio signal with a variable gain at a second location in the cabin, and a control signal generating circuit for generating an automatic gain control signal in response to the decision logic, wherein when the decision logic decides that the second component corresponds to an undesirable transient signal, the control signal generating circuit generates the automatic gain control signal so as to gracefully set the gain of the loudspeaker to zero for fade-out.
Another aspect of the present invention is directed to an improved user interface installed in the cabin for improving the ease and flexibility of the CCS
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of the preferred embodiments taken in connection with the attached drawings.