The present invention relates to a device for use in a speech recognizer or similar apparatus for normalizing the spectrum of speech.
Recognition of speech in noisy environments is extremely difficult because noise not only masks speech but also causes the utterance itself to change due to the Lombard effect, as well known in the art. The Lombard effect stems from the fact that a person speaking in a noisy environment tends to speak louder and more clearly because the speakers words themselves are hard to distinguish. The spectrum of speech in a noisy condition has greater total energy than and a different shape from the spectrum of speech spoken by the same speaker in a quiet environment.
Implementations for normalizing the spectrum of speech, i.e., correcting the spectral shape are disclosed by Miwa et al in a paper entitled "Investigation on Interspeaker Normalization for Speech Recognition", PROC. of Acoustical Society of Japan, 3-2-1, pp. 577-578, June 1979 (referred to as Prior Art 1 hereinafter), and by David B. Roe in a paper entitled "ADAPTATION OF A SPEECH RECOGNIZER TO THE LOMBARD EFFECT IN HIGH NOISE CONDITIONS" IEICE Technical Report SP86-66, 1986 (referred to as Prior Art 2 hereinafter). Prior Art 1 is directed toward the recognition of speeches of unspecified talkers.
For example, the spectrum normalizing method proposed in Prior Art 1 compensates for the influence of vocal path length which depends upon the individual, i.e., it normalizes linear influence with respect to the logarithmic frequency axis. However, the Lombard effect results in a substantial increase of energy in a certain range of speech frequencies, and the influence of such an increase of energy is non-linear to logarithmic frequency axis. This prior art method, therefore, is incapable of sufficiently normalizing the Lombard effect.