1. Field of the Invention
The present invention relates to a Lombard voice recognizing method and apparatus for recognizing an input voice in a noisy background.
2. Description of the Related Art
The inventors of the present invention know a voice recognizing method which is capable of achieving a high recognition ratio of a voice input by a particular person even in a noisy background. The above-mentioned voice recognizing method utilizes a method for weighting a peak (formant) of a frequency spectrum derived from a voice waveform, the peak point keeping a relatively excellent sound to noise (S/N) ratio.
In the "Evaluating performance of an amending method by movement of formants against the transformation of a voice in a noisy background", reported by Takizawa and Hamada, Proceedings of Japan Acoustic Society, 1-8-9 (September, 1990), it is reported that the transformation of voice in a noisy background (Lombard effect) adversely effects on the voice recognition ratio. This report says that in a noisy background the formants located in a lower region than 1.5 KHz is forced to be shifted toward a higher frequency region by 120 Hz on an average whatever phoneme the input voice has. Hence, the foregoing voice recognizing method known by the present inventors may degrade its essential effect because of the shift of the formants in the noisy background.
The above-mentioned report by Takizawa, et. al. states that a cepstrum coefficient used in a linear predictive coding (LPC) method is amended by using an amending formula (1) in a noisy background. The amending formula (1) makes use of a presumed formant frequency and a band width of the presumed formant as shown below. EQU Cn=Cn+.gamma.n(1)
where Cn denotes an amended cepstrum coefficient of the LPC, Cn denotes an n-th degree cepstrum coefficient of the LPC about the transformed voice, and .gamma.n denotes amendment of Cn. ##EQU1## wherein .DELTA.f denotes a difference between a formant frequency of an actual voice and that of a transformed voice and fi denotes the presumed frequency of the i-th formant about the transformed voice. ##EQU2## wherein bi denotes a band width of the i-th formant and K denotes a sampling frequency
Hence, EQU (.differential.Cn/.differential.fi)=(-4/K)exp(-n.pi.bi/K).multidot.sin (2.pi.fin/K)
As will be understood from the above, the foregoing voice recognition method reported by Takizawa, et. al. needs to presume the formant frequency fi. In the noisy background, however, it is difficult to presume the formant freqency fi from the waveform of the input voice.
Hence, an error takes place in the presumed formant frequency fi. Thereby, by using the reported voice recognition method, it is difficult to amend the cepstrum coefficient of the Linear Predictive Coefficient (LPC) accurately.