The invention relates to vehicle voice enhancement systems and hands-free cellular telephone systems using microphones mounted throughout a vehicle to sense driver and/or passenger speech. In particular, the invention relates to improvements in the selection of transmitted microphone signals and noise reduction filtering.
A vehicle voice enhancement system uses intercom systems to facilitate conversations of passengers sitting within different zones of a vehicle. A single channel voice enhancement system has a near-end zone and a far-end zone with one speaking location in each zone. A near-end microphone senses speech in the near-end zone and transmits a voice signal to a far-end loudspeaker. The far-end loudspeaker outputs the voice signal into the far-end zone, thereby enhancing the ability of a driver and/or passenger in the far-end zone to listen to speech occurring in the near-end zone even though there may be substantial background noise within the vehicle. Likewise, a far-end microphone senses speech in the far-end zone and transmits a voice signal to a near-end loudspeaker that outputs the voice signal into the near-end zone. Voice enhancement systems not only amplify the voice signal, but also bring an acoustic source of the voice signal closer to the listener.
Microphones are typically mounted within the vehicle near the usual speaking locations, such as on the ceiling of the vehicle passenger compartment above the seats or on seat belt shoulder harnesses. Inasmuch as microphones are present when implementing a vehicle voice enhancement system, it is desirable to use the voice enhancement system microphones in combination with a cellular telephone system to provide a hands-free cellular telephone system within the vehicle.
It is important that an integrated voice enhancement system and hands-free cellular telephone system be able to transmit clear intelligible voice signals. This can be difficult in a vehicle because significant acoustic changes can occur quickly within the passenger compartment of the vehicle. For instance, background noise can change substantially depending on the environment around the vehicle, the speed of the vehicle, etc. Also, the acoustic plant within the passenger compartment can change substantially depending upon temperature within the vehicle and/or the number of passengers within the vehicle, etc. Adaptive acoustic echo cancellation as disclosed in U.S. Pat. Nos. 5,033,082 and 5,602,928 and pending U.S. patent application Ser. No. 08/626,208, can be used to effectively model various acoustic characteristics within the passenger compartment to remove annoying echoes. However, even after annoying echoes are removed, background noise within the vehicle passenger compartment can distort voice signals. Further, microphone switching can create unnatural speech patterns and annoying clicking noises.
Providing intelligible and natural sounding voice signals is important for voice enhancement systems, and is also important for hands-free cellular telephone systems. However, providing intelligible and natural sounding voice signals is typically more difficult for cellular telephone systems. This is because a listener on the other end of the line must be able to not only clearly hear speech from the vehicle but also must be able to easily detect whether the cellular telephone is on-line. That is, the line must not appear dead to the listeners when no speech is present in the vehicle. Also, the listener on the other end of the line is typically in a quiet environment and the presence of background vehicle noises during speech is annoying.
The invention is an integrated vehicle voice enhancement system and hands-free cellular telephone system that implements a voice activated microphone steering technique to provide intelligible and natural sounding voice signals for both the voice enhancement aspects of the system and the hands-free cellular telephone aspects of the system. This invention arose during continuing development efforts relating to the subject matter of U.S. Pat. Nos. 5,033,082; 5,602,928; 5,172,416; and copending U.S. patent application Ser. No. 08/626,208 entitled xe2x80x9cAcoustic Echo Cancellation In An Integrated Audio and Telecommunication Intercom Systemxe2x80x9d), all incorporated herein by reference. The invention applies to both single channel (SISO) and multiple channel (MIMO) systems.
In one aspect, the invention involves the use of a microphone steering switch that inputs echo-cancelled voice signals from the microphones within the vehicle and outputs a raw telephone input signal. Each of the microphones in the system has the capability of switching between an xe2x80x9coffxe2x80x9d state and an xe2x80x9conxe2x80x9d state. The microphones are voice activated such that a respective microphone can switch into the xe2x80x9conxe2x80x9d state only when the sound level in the microphone signal (e.g. dB) exceeds a threshold switching value, thus indicating that speech is present in a speaking location near the microphone. The microphone steering switch outputs a raw telephone input signal which is preferably a combination of 100% of the microphone output from the microphone in the xe2x80x9conxe2x80x9d state, and preferably approximately 20% of the microphone output from the microphone(s) in the xe2x80x9coffxe2x80x9d state. In order for the telephone input signal to be intelligible by a person on the other end of the cellular telephone line, the invention allows only one of the microphones to be designated as the primary microphone (i.e. switched to the xe2x80x9conxe2x80x9d state) at any given time.
The invention implements microphone steering techniques for the designation of primary microphone signals into the xe2x80x9conxe2x80x9d state so that no two microphones are switched into the xe2x80x9conxe2x80x9d state at the same time. Yet, microphone output between the xe2x80x9conxe2x80x9d and xe2x80x9coffxe2x80x9d states fades out and cross-fades between microphones in a manner that is not annoying to the driver and/or passengers within the vehicle or a person on the other end of the cellular telephone line.
When generating the raw telephone input signal, it is desirable that a rather high percentage of the microphone output for the microphones in the xe2x80x9coffxe2x80x9d state, for example approximately 20%, be transmitted so that the cellular telephone line does not appear dead to a person on the other end of the telephone line when speech is not present within the vehicle.
In a second aspect, the invention applies noise reduction filters to filter out the background vehicle noise in the system microphone signals. In a microphone steering context, it is designed to remove the noise in the signals corresponding to the microphone(s) in the xe2x80x9conxe2x80x9d state. The noise reduction filters are important for three primary reasons:
1. They generate a noise-reduced telephone input signal having improved clarity. By properly steering and switching the microphone signals, an intelligible raw telephone input signal is derived from the set of system microphone signals. However, this signal also contains a relatively large amount of background noise which in many cases severely degrades the quality of the speech signal, especially to a listener in a quiet environment on the other end of the line.
2. They reduce the background noise that is rebroadcasted to the system loudspeakers in both SISO and MIMO voice enhancement systems. The rebroadcast of the background noise is very perceivable in situations where the noise characteristics spatially vary within the vehicle. This is common in large vehicles where the amount of wind noise (i.e. open/closed window or sunroof), HVAC/fan noise, road noise, etc. vary depending on the passenger""s position in the vehicle.
3. For vehicles employing voice recognition systems (for example, those that are used to interpret hands-free cellular phone commands), the background noise on the microphone signal(s) can severely degrade the performance of such systems. The noise reduction filter(s) reduce the background noise and therefore improve the performance of the voice recognition.
In its most general state, the noise reduction filters are applied to each of the microphone signals after the echo has been subtracted. However, if processing power is limited on the electronic controller, a single noise reduction filter can be applied to the microphone steering switch output to remove the background noise in the outgoing cell phone signal.
The preferred noise reduction filter includes a bank of fixed filters, preferably spanning the audible frequency spectrum, and a time-varying filter gain element xcex2m corresponding to each fixed filter. The raw input signal inputs each of the fixed filters, and the output of each fixed filter zm(k) is weighted by the respective time-varying filter gain element xcex2m. A summer combines the weighted and filtered input signals and outputs a noise-reduced input signal. The preferred noise reduction filters process the raw input signal in real time in the time domains. Therefore, the need for inverse transforms which are computationally burdensome is eliminated. The time-varying filter gain elements are preferably adjusted in accordance with a speech strength level for the output of each respective fixed filter. In this manner, the noise reduction filter tracks the sound characteristics of speech present in the raw input signal over time, and gives emphasis to bands containing speech, while at the same time fading out background noise occurring within bands in which speech is not present. However, if no speech at all is present in the raw input signal, the noise reduction filter will allow sufficient signal to pass therethrough so that the cellular telephone line does not appear dead to someone on the other end of the line.
The preferred transform is a recursive implementation of a discrete cosine transform modified to stabilize its performance on digital signal processors. The preferred transform (i.e. Equations 1 and 2) has several important properties that make it attractive for this invention. First, the preferred transform is a completely real valued transform and therefore does not introduce complex arithmetic into the calculations as with the discrete Fourier transform (DFT). This reduces both the complexity and the storage requirements. Second, this transform can be efficiently implemented in a recursive fashion using an IIR filter representation. This implementation is very efficient which is extremely important for voice enhancement systems where the electronic controllers are burdened with the other echo-cancellation tasks.
It should be noted that the preferred transform (i.e. Equations 1 and 2) has two major advancements over the traditional recursive-type of transforms mentioned in the literature. Traditional recursive-type of transforms, including the xe2x80x9cslidingxe2x80x9d DFT transform, often suffer from filter instability problems. This instability is the result of round-off errors which arise when the filter parameters are implemented in the finite precision environment of a digital signal processor (DSP). More precisely, the instability is due to non-exact cancellation of the xe2x80x9cmarginallyxe2x80x9d stable poles of the filter which is caused by the parameter round-off errors. The preferred transform presented here is designed to overcome these problems by modifying the filter parameters according to a xcex3 factor. This stabilizes the filter and is well suited for a variety of hardware systems since xcex3 can be adjusted to accommodate different fixed or floating-point digital signal processors. Another advancement of the preferred transform over the conventional transforms is that each of the filters in the preferred transform is appropriately scaled such that the summation of all of the filter outputs, zm(k): m=0 . . . M-1, at any instant in time equals the input at that instant in time. Thus, the combining of the outputs acts as an inverse transform. Therefore, an explicit inverse transform is not required. This further increases the efficiency of the transformation.
The time-varying gain elements, xcex2m applied to the filtered input signals also have several major improvements over the existing approaches. It should be noted that the performance of the system lies solely in the proper calculation of the gain elements xcex2m since with unity gain elements the system output is equal to the input signal resulting in no noise reduction. Existing techniques often suffer from poor speech quality. This results from the filter""s inability to adjust to rapidly varying speech giving the processed speech a xe2x80x9cchoppyxe2x80x9d sound characteristics. The approach taken here overcomes this problem by adjusting the time-varying gain elements xcex2m in a frequency-dependent manner to ensure a fast overall dynamic response of the system. The xcex2m gains corresponding to high frequency bands are determined according to speech strength level computed from a relatively small number of filter output samples, zm(k), since high frequency signals vary quickly with time and therefore fewer outputs are needed to accurately estimate the output power. On the other hand, the xcex2m gains corresponding to low frequency bands are computed from a larger number of filter output samples in order to accurately measure the power of low frequency signals which are slowly time-varying. By determining the xcex2m gains in this frequency band-dependent fashion, each band in the filter is optimized to provide the fastest temporal response while maintaining accurate power estimates. If the system xcex2m gains for the bands were determined in the same manner or by using the same formula, as is common in existing methods, the dynamic response of the high frequency bands would be compromised to achieve accurate low power estimates. Furthermore, this approach uses a closed-form expression for the xcex2m gain based on the speech strength levels in each band, and therefore does not require a table of gain elements to be stored in memory. This expression also has been derived such that when speech levels are low in a particular frequency band, the xcex2m gain of the band is not set to zero, but some low level value. This is important so that the cell phone input does not appear xe2x80x9cdeadxe2x80x9d to the listener at the other end of the line, and it also significantly reduces signal xe2x80x9cflutterxe2x80x9d.
In another aspect, the invention implements microphone steering switches for multiple channel voice enhancement systems. For instance, such a MIMO voice enhancement system typically has two or more microphones in a near-end acoustic zone and two or more microphones in a far-end acoustic zone. While the microphones in the near-end zone are typically not acoustically coupled to the microphones in the far-end zone, microphones within the near-end zone may be acoustically coupled to one another and microphones within the far-end zone may be acoustically coupled to one another. In implementing the MIMO voice enhancement system, it is desirable that only one of the microphones in the near-end zone be designated as a primary microphone (i.e. switched into the xe2x80x9conxe2x80x9d state) at any given time in order for the transmitted input signal to the far-end zone to be intelligible. This is important not only when two or more passengers within the vehicle are speaking, but also to prevent acoustic spill over from one speaking location in the near-end zone to another speaking location in the near-end zone which could cause microphone falsing. Preferably, a similar steering switch is provided to generate a transmitted near-end input signal from the far-end microphone signals. In implementing the steering switches for the voice enhancement system, it is preferred that microphones in the xe2x80x9coffxe2x80x9d state contribute a small percentage of the microphone output, such as 5%-10% or less, so that transmission of background noise through the voice enhancement system is not noticeable by the driver and/or passengers within the vehicle. It is desirable that a small undetectable percentage of the microphone output be contributed to the respective input signal to prevent annoying microphone clicking that would occur if the microphone switches electrically between being on and being completely off.