The present invention relates to a field of speech recognition, more particularly, to a technique for improving an acoustic model used in speech recognition.
In speech recognition, a statistical method is used. The statistical method is a method in which features of speech are accumulated from training data which is a recorded large amount of speech data, and a word sequence which is the closest to the features is output as a recognition result while comparing input speech signals with the accumulated features. Typically, acoustic features of the speech are often dealt with separately from linguistic features. The acoustic features represent what kind of frequency property each phoneme of a recognition target has, and are referred to as an acoustic model (AM).
One technique for converting a cepstrum so as to match an acoustic model, feature space maximum likelihood linear regression (hereinafter, also referred to as “FMLLR”) is known. The cepstrum which is a feature widely used in speech recognition, is a signal obtained by performing inverse fast Fourier transform (IFFT) on a logarithmic power spectrum of an observation signal for separating a signal of fine frequency property (for example, vocal cord vibration) from a signal which is an observed signal that has passed through a filter having smooth frequency property (for example, a vocal tract).