1. Field of the Invention
The present invention relates to a reference pattern producing apparatus, and more particularly to a reference pattern producing apparatus with a controlled contribution of learning coefficients.
2. Description of the Related Art
In recent years, a speech recognition apparatus is known in which a speaker independent reference pattern is produced from a time series of features of speech uttered from each of a plurality of speakers. In this case, the Hidden Markov model (HMM) is widely used for the modeling of acoustic features. The Hidden Markov model has advantages in stochastic description of the fluctuation of speech and a good affinity with a probability statistic language model such as a bigram, because it is a statistical model.
The learning in the Hidden Markov model is carried out based on a learning algorithm called Baum-Welch (or Forward-Backward) algorithm. This algorithm includes a step of determining an expectation (Expectation step) and a step of maximizing the expectation (Maxmization step) which are alternately repeated. Thus, the algorithm is also called an EM algorithm.
FIG. 1 is a block diagram showing the structure of a conventional learning apparatus for the EM algorithm. Samples for the learning are stored in a data storage section 201. A reference pattern producing section 203 produces and outputs reference patterns using the samples, to an output terminal 210. For example, a specific algorithm is described in detail in the third chapter xe2x80x9cSpeech Recognition Algorithm in HMM Methodxe2x80x9d of xe2x80x9cSpeech Recognition by Stochastic Modelxe2x80x9d published from corona company by Seiichi NAKAGAWA.
An actual learning example will be described below.
FIG. 5 shows the histograms when 1000 samples are extracted from each of two Gaussian distributions N(1.0, 1.0) and N(4.0,1.0). FIG. 6 shows initial distributions given to the EM algorithm. In the EM algorithm, the update of an average, a variance, mixture coefficients is repeated based on the initial distributions. FIG. 7 shows distributions obtained after the repetitive learning process of 500 times. It could be understood that the two Gaussian distributions are correctly estimated through the EM algorithm.
FIG. 8 shows the histograms obtained from the Gaussian distributions shown in FIG. 5. In this case, samples obtained from one of the two Gaussian distributions are about {fraction (1/10)} of samples obtained from the other distribution. In the same way as the above example, when the Gaussian distributions of FIG. 6 are given to these sample groups as the initial distributions, the distributions are estimated, as shown in FIG. 9. In this example, the estimation of the distribution for the small number of samples is not correctly carried out because an objective function of the learning algorithm is defined to the whole of the learning samples.
Therefore, when the number of samples in one of the distributions is insufficient, an error is derived in the distribution estimation on the side of the less samples due to the influence on the side of many samples. For this reason, the collection of the learning samples is carried out in the learning step of HMM such that the numbers of samples are equal to each other for respective distributions. For example, when the learning is carried out using the learning data of men and women, it is desirable that the numbers of samples for the men and the women are approximately equal.
The same problem would be caused when contribution of a distribution to be optimized to an objective function or algorithm is constant regardless of learning data in an algorithm such as the EM algorithm.
Conventional techniques for compensating degradation of estimation precision due to a lack of learning samples are described in Japanese Patent No. 2,701,500 (corresponding to Japanese Laid Open Patent Application (JP-A-Heisei 3-212696) and Japanese Laid Open Patent Application (JP-A-Heisei 9-160586).
In the technique described in Japanese Patent No. 2701500, an acoustic model (speaker independent model) has been made to learn based on learning data uttered from a plurality of speakers and then is made to adapt to a specific speaker using a small amount of learning data uttered from that specific speaker. In this case, parameters having large dependency on the speaker are determined based on a ratio of a variance between speakers to a variance for the speaker in feature parameters. Then, the adaptation to the specific speaker is carried mainly for the parameters. Thus, the acoustic model is produced to be adaptive for the specific speaker with good precision based on a small amount of data. However, even when this technique is used, when the precision of the speaker independent model is degraded due to the lack of learning samples, the adaptation for the speaker is carried out based on the unsuitable variance. Therefore, the above problem remains.
The technique described in Japanese Laid Open Patent Application (JP-A-Heisei 9-160586) is as follows. The constraint of phonemic environment is lessened for an acoustic model with a small amount of learning samples to increase the learning samples. By linearly combining the acoustic mode and a learned mode (environment dependence type model), the parameters of the model are smoothed so that the stabilization of the parameter estimation will be accomplished. Thus, this method is aimed at improvement of the estimation precision in the learning algorithm.
Also, a learning system for pattern recognition is described in Japanese Laid Open Patent Application (JP-A-Heisei 4-354049). In this reference, a recognition rate and a summation of values p indicative of a ratio of likelihood rpt and a likelihood rptxe2x80x2 between elements having values approximate to a likelihood rpq in each set are used as a function for estimating a learning result. In this learning system of pattern recognition, a feeding-back loop is adopted in which a learning coefficient C(R) is changed when the current estimation result is degraded more than the previous estimation result. The processing procedure is composed of a distinguishing process, an estimating process of a reference vector, a re-estimating process of the reference vector, a resetting process of the learning coefficient and a learning process. In this way, a reference pattern can be produced such that a recognition rate Rr becomes maximum which is obtained when an unknown set T is recognized based on an optimal reference pattern to the unknown set, i.e., the reference pattern determined from a known set S. In this reference, the influence of unbalance in the number of learning samples cannot be avoided.
Therefore, an object of the present invention is to provide a reference pattern producing apparatus in which the degradation of the estimation precision in a learning algorithm due to disproportionality in the number of samples between populations can be prevented by controlling the contribution of samples belonging to each population to the learning algorithm.
In order to achieve an aspect of the present invention, a reference pattern producing apparatus includes a data storage section, a learning coefficient storage section and a reference pattern producing section. The data storage section stores learning data of a content and a time series of features for each of a plurality of samples groups. The learning coefficient storage section stores learning coefficients corresponding to each of the plurality of sample groups. The reference pattern producing section repetitively learns the learning data using the learning coefficients to produce reference patterns.
Each learning coefficient may be determined based on a size of a corresponding one of the plurality of sample groups. In this case, a product of a number of samples in each of the plurality of sample groups and a corresponding one of the learning coefficients is the same over the plurality of sample groups.
In order to achieve another aspect of the present invention, a method of producing reference patterns includes:
learning a learning data of a content and a time series of features for each of a plurality of samples groups using learning coefficients corresponding to the plurality of sample groups;
repeating the learning until a predetermined condition is met; and
outputting reference patterns when the predetermined condition is met.
In order to achieve still another aspect of the present invention, a reference pattern producing apparatus includes a data storage section a learning coefficient storage section, a reference pattern producing section, a learning coefficient updating section and a control section. The data storage section stores learning data of a content and a time series of features for each of a plurality of groups of samples. The learning coefficient storage section stores learning coefficients corresponding to each of the plurality of groups of samples. The reference pattern producing section learns the learning data using the learning coefficients to produce reference patterns. The learning coefficient updating section determines measures of fitness between each of the samples and a corresponding one of the produced reference patterns, and updates the learning coefficients based on the determined measures of fitness. The control section repetitively controls the reference pattern producing section and the learning coefficient updating section until a predetermined condition is met, and outputs the produced reference patterns when the predetermined condition is met.
Each of the measures of fitness may be a likelihood between each of the samples and a corresponding one of the produced reference patterns. Also, each of the measures of fitness may be a recognition rate between each of the samples and a corresponding one of the produced reference patterns.
The learning coefficient updating section updates the learning coefficients based on the determined measures of fitness such that contribution, to an objective function or learning algorithm, of the learning coefficients for the measures of fitness larger than an average of measures of fitness is made small and such that contribution, to the objective function, of the learning coefficients for the measures of fitness smaller than the average of measures of fitness is made larger.
Also, the predetermined condition may be a predetermined number of times of the update of the learning coefficients. Instead, the predetermined condition may be that an update quantity of any of the learning coefficients is less than a predetermined value.
In order to achieve yet still another aspect of the present invention, a method of producing reference pattern includes:
performing learning of learning data using learning coefficients to a plurality of groups of samples to produce reference patterns, the learning data being composed of a content and a time series of features for each of the plurality of groups of samples;
determining measures of fitness between each of the samples and a corresponding one of the produced reference patterns;
updating the learning coefficients based on the determined measures of fitness;
repetitively executing the performing, the determining and the updating; and
outputting the produced reference patterns when the predetermined condition is met.