1. Field of the Invention
The invention relates to speech recognizing method and apparatus for enabling speech recognition with a high recognition rate by eliminating distortion of line characteristics and the influence by internal noise.
The invention also relates to a speech recognizing method and apparatus for performing speech recognition by using a hidden Markov model (HMM).
2. Related Background Art
In case of performing speech recognition in an actual environment, the distortion of line characteristics by the influence of a microphone, telephone line characteristics, or the like and additive noise such as internal noise or the like, particularly cause problems. As methods of coping with those problems, a Cepstrum Mean Subtraction (CMS) method and a Parallel Model Combination (PMC) method have been proposed so far. The CMS method has been described in detail in Rahim et al., "Signal Bias Removal for Robust Telephone Based Speech Recognition in Adverse Environments", Proc. of ICASSP, '94, (1994) or the like. The PMC method has been described in detail in M. J. Gales, S. Young, "An Improved Approach to the Hidden Markov Model Decomposition of Speech and Noise", Proc. of ICASSP, '92, I-233-236, (1992).
The CMS method is a method of compensating for the distortion of line characteristics. On the other hand, the PMC method is a method of coping with additive noise. In both of those methods, a noise portion and a speech portion are detected in input speech, and a Hidden Markov Model (HMM) formed in an environment without line distortion and noise is corrected on the basis of that information, thereby allowing it to be adapted to the input environment. With this method, even if the line characteristics or noise fluctuate, it is possible to flexibly cope with such a case.
The CMS method is a method of compensating for multiplicative noise (line distortion) which acts by convoluting of an impulse response. A long time spectrum of input speech is subtracted from the input speech and a long time spectrum of speech used to form a model is subtracted from the model, thereby normalizing the difference between line characteristics. The normalizing process is generally executed in a logarithm spectrum region or Cepstrum region. Since the multiplicative noise appears as an additive distortion in those two regions, the noise can be compensated for by a subtraction. A method of performing the normalizing process in the Cepstrum region between those methods is called a CMS method.
The PMC method is a method of adding and synthesizing an HMM (speech HMM) learned by speech recorded in a noiseless environment and an HMM (noise HMM) learned by noise, thereby making the model further approach a noise multiplexed environment. In the noise process in the PMC, it is presumed that an additiveness of the noise and speech is satisfied in a linear spectrum region. On the other hand, in the HMM, parameters of a logarithm spectrum system, such as Cepstrum or the like, are often used as a feature amount of a speech. According to the PMC method, those parameters are converted into the linear spectrum region and feature amounts, which are obtained from the speech HMM, and noise HMM are added and synthesized in the linear spectrum region. After the speech and noise were synthesized, by performing an inverse conversion, the region is returned from the linear spectrum region to the Cepstrum region, thereby obtaining noise multiplexed speech HMM.