The invention relates generally to speech recognition systems, and relates more specifically to an approach for automatically retraining a speech recognition system.
Most speech recognition systems are xe2x80x9ctrainedxe2x80x9d for specific applications or contexts. Training a speech recognition system generally involves generating a statistical model for a sample set of speech utterances that are representative of a specific application or context. The sample set of speech utterances is typically referred to as a xe2x80x9ctraining set.xe2x80x9d Generating a statistical model for a training set involves two fundamental steps. First, measurements are performed on the training set to generate a body of measurement data for the training set that specifies attributes and characteristics of the training set. Some training sets require a large amount of measurement data because of the number and character of speech utterances contained in the training set. Furthermore, a large amount of measurement data is often desirable since the accuracy of statistical models generally increases as the amount of measurement data increases. Human review and confirmation of measurement results is often employed to improve the accuracy of the measurement data, which can be very labor intensive and can take a long time.
Once the measurement data has been generated, statistical analysis is performed on the measurement data to generate statistical model data that defines a statistical model for the measurement data. The statistical model is a multi-dimensional mathematical representation derived from the training set.
Once a statistical model has been generated, a received speech utterance is evaluated against the statistical model in an attempt to match the received speech utterance to a speech utterance from the training set. Sometimes separate statistical models are used for different applications and contexts to improve accuracy.
Statistical models periodically require retraining to account for changes in the applications or contexts for which the statistical models were originally determined. For example, a particular application may use new words or subjects that are not represented in the statistical model for the particular application. As a result, the statistical model may not provide a high level of accuracy with respect to the new words or subjects. Retraining allows the statistical model to reflect the new words or subjects.
Conventional retraining is usually performed in a manual, offline process by supplementing the training data with the new words or subjects and then rebuilding the statistical model from the supplemented training data. One problem with this approach is that manual retraining can be very labor intensive (requiring substantial human supervision) and take a long time to implement. This means that statistical models cannot be quickly updated to recognize changes in utterances. Another problem with conventional retraining techniques is that the amount of measurement data that must be maintained continues to grow over time as the number and size of training sets increases. As a result, the measurement data requires an ever increasing amount of system resources, e.g., non-volatile storage such as disks, to store the data. For speech recognition systems requiring a large number of statistical models, e.g., for different applications, different users, or different subject matter, the amount of measurement data can be enormous.
Yet another problem with conventional retraining approaches is that new measurement data is often not adequately represented in statistical models. This occurs, for example, during retraining when a relatively small amount of new measurement data is processed with a relatively larger amount of prior measurement data to generate new statistical model data. The relatively larger amount of prior measurement data tends to dilute the effect of the relatively smaller amount of new measurement data. As a result, speech utterances associated with the new measurement data may not be adequately represented in the new statistical model data, resulting in a lower level of accuracy.
Based on the foregoing, there is a need for an approach for retraining speech recognition systems that avoids the limitations in the prior approaches.
There is a particular need for a computer-implemented approach for automatically retraining a speech recognition system that requires a reduced amount of human supervision. There is also a need for an approach for retraining a speech recognition system that reduces the amount of prior measurement data that must be maintained.
There is a further need for a retraining approach that addresses the problem of new measurement data dilution.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved by the present invention, which comprises, in one aspect, a method for automatically retraining a speech recognition system. According to the method, prior measurement data that was determined for a prior set of speech utterances is retrieved. New measurement data is determined for a new set of speech utterances. A weighting factor is applied to the new measurement data to generate weighted new measurement data. New statistical model data is generated using the prior measurement data and the weighted new measurement data.
According to another aspect, a method is provided for automatically retraining a speech recognition system. Prior measurement data that was determined for a prior set of speech utterances is retrieved. New measurement data is determined for a new set of speech utterances. A weighting factor is applied to the prior measurement data to generate weighted prior measurement data. New statistical model data is generated using the weighted prior measurement data and the new measurement data.
According to another aspect, a method is provided for automatically retraining a speech recognition system. A first set of speech utterances is retrieved. Then, first measurement data is determined for the first set of speech utterances. First statistical model data is determined based upon the first measurement data. A statistical model is determined based upon the first statistical model data. A second set of speech utterances is retrieved. Second measurement data is determined for the second set of speech utterances. Second statistical model data is determined based upon the second measurement data. Finally, an updated statistical model is determined using the first statistical model data and the second statistical model data and without using either the first measurement data or the second measurement data.
According to another aspect a speech recognition system comprises a storage medium and a retraining mechanism communicatively coupled to the storage medium. The retraining mechanism is configured to retrieve prior measurement data determined for a prior set of speech utterances from the storage medium. The retraining mechanism is also configured to determine new measurement data for a new set of speech utterances. The retraining mechanism is further configured to apply a weighting factor to the new measurement data to generate weighted new measurement data. The retraining mechanism is configured to generate new statistical model data using the prior measurement data and the weighted new measurement data.