The in-vehicle user experience may be enhanced by improving how users interact with their vehicles via speech. In this regard, it is desired to improve the ability of an Automatic Speech Recognition (ASR) system of a vehicle to consistently recognize voice commands while the vehicle is operating under varied operating conditions.
Traditional acoustic models are static and trained under a variety of operating conditions considered typical for ASR use cases. For a vehicle, typical operating conditions include vehicle idling in a parking lot, vehicle driving on the highway with the windows up, vehicle driving on the highway with the windows down, etc. The structure of the vehicle such as the amount of insulation in the vehicle cabin, fuel economy structural characteristics of the vehicle, etc., is also taken into account. The typical operating conditions have significantly different background noise levels which inherently present a challenge in building a static acoustic model. Consequently, a single, static acoustic model is incapable of working well under varied operating conditions.
The Lombard Effect is a human response to ambient noise, where the speaker speaks louder as a compensatory mechanism. In addition to the Lombard Effect being an increase in the output volume of a speech, the spectral density of the speech shifts towards higher frequencies and the duration of phonemes increases. These changes in spectra more so than the decrease in signal-to-noise ratio presents challenges to speech recognition engines. Therefore, a problem presented is that a robust acoustic model is to perform equally well for both neutral (non-Lombard) speech and noisy (Lombard) speech, despite the change in spectra.