(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of computer systems for speech recognition and more specifically to a system for managing multiple pronunciations for a speech recognition vocabulary.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted to a set of words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry and command and control. Speech recognition is generally a difficult problem due to the wide variety pronunciations, individual accents and speech characteristics of individual speakers. Acoustic models are stored representations of word pronunciations a speech recognition application uses to help identify words spoken by a user. As part of the speech recognition process, these acoustic models are compared to the pronunciations of words as they are spoken by a user in order to identify a corresponding text word.
There are several ways that acoustic models can be inserted into the vocabulary of a speech recognition application. For example, developers of speech recognition systems commonly provide an initial set of acoustic models or base forms for a basic vocabulary set and possibly for auxiliary vocabularies. In some cases, multiple acoustic models are provided for words with more than one pronunciation.
Since each particular user will tend to have their own style of speaking, it is important that the speech recognition system have the capability to recognize a particular user""s pronunciation of certain spoken words. By permitting the user to update the acoustic models used for word recognition, it is possible to improve the overall accuracy of the speech recognition process for that user and thereby permit greater efficiencies.
Conventional speech recognition products that allow additional acoustic models for alternative pronunciations of words typically require the user to decide when such acoustic models are to be added to those already existing. Significantly, however, this tends to be an extremely difficult decision for users to make since users often do not understand the basis upon which such a decision is to be made. Moreover, the task of managing multiple sets of acoustic models to account for variations in pronunciation can be a problem in a speech recognition application. For example, it is not desirable to maintain and store in memory large numbers of alternative acoustic models which do not truly reflect a user""s word pronunciations. Also, acoustic models which are inappropriate for a particular user""s pronunciations can cause repeated undesirable errors in otherwise unrelated words in the speech recognition process.
The invention concerns a method and system for automatically managing acoustic models in a computer speech dictation system. The method is intended to ensure that only high reliability acoustic models which accurately reflect word pronunciations for a given user are retained. The method is accomplished by assigning a base quality metric value for each of the acoustic models maintained by a speech recognition application. The quality metric is incremented or decremented upon the occurrence of certain events relevant to the reliability of the acoustic model. Acoustic models are discarded when the quality metric value falls below a threshold value.
According to one aspect, the method includes the step of decrementing the value of the quality metric when a text word associated with a particular acoustic model is amended. Such amendments can be in the form of corrections or deletions of existing text which has been dictated by a user. The quality metric is decremented in such instances to indicate a lower degree of reliability associated with the particular acoustic model. According to another aspect of the invention, the method can include the step of decrementing the quality metric when an additional acoustic model is added to those already existing for a text word. Conversely when an acoustic model is used to correctly identify a dictated word, the quality metric of the particular acoustic model is advantageously incremented so as to indicate a greater degree of reliability.
The method can also include the step of determining whether an alternate acoustic model exists for a particular text word before discarding the acoustic model. This step ensures that an acoustic model is not discarded unless there exists some better alternative model. In this regard, an acoustic model is preferably not discarded unless its quality metric is less than the quality metric of any alternate acoustic model which is available.
According to another aspect, the invention can further include the step of discarding an existing acoustic model when a new acoustic model has been provided and the existing acoustic model has been flagged as originating from an unreliable source. Unreliable sources include any function capable of generating an acoustic model which is derived from text-to-speech analysis, copied acoustic models, and typed pronunciations.
According to another aspect of the invention, the method as described herein can be implemented in a computer speech recognition system. According to yet another aspect, the invention can be implemented as a machine readable storage for implementing the foregoing method on a computer system.