The invention relates generally to speech recognition systems, and relates more specifically to an approach for evaluating the accuracy of a pronunciation dictionary in a speech recognition system.
Most speech recognition systems use a pronunciation dictionary to identify particular words contained in received utterances. The term xe2x80x9cutterancexe2x80x9d is used herein to refer to one or more sounds generated either by humans or by machines. Examples of an utterance include, but are not limited to, a single sound, any two or more sounds, a single word or two or more words. In general, a pronunciation dictionary contains data that defines expected pronunciations of utterances. When an utterance is received, the received utterance, or at least a portion of the received utterance, is compared to the expected pronunciations contained in the pronunciation dictionary. An utterance is recognized when the received utterance, or portion thereof, matches the expected pronunciation contained in the pronunciation dictionary.
One of the most important concerns with pronunciation dictionaries is to ensure that expected pronunciations of utterances defined by the pronunciation dictionary accurately reflect actual pronunciations of the utterances. If an actual pronunciation of a particular utterance does not match the expected pronunciation, the expected pronunciation of the particular utterance may no longer be useful for identifying the actual pronunciation of the particular utterance.
Actual pronunciations of utterances can be misrepresented for a variety of reasons. For example, in fluent speech, some sounds may be systematically deleted or adjusted. An application may be installed across diverse geographic areas where users have different regional accents. Expected pronunciations tend to be somewhat user-dependent. Consequently, a change in the users of a particular application can adversely affect the accuracy of a speech recognition system. This is attributable to different speech characteristics of users, such as different intonations and stresses in pronunciation.
Conventionally, pronunciation dictionaries are updated manually to reflect changes in actual pronunciations of utterances in response to reported problems. When a change in an application or user prevents a speech recognition system from recognizing utterances, the problem is reported to the administrator of the speech recognition system. The administrator then identifies the problem utterances and manually updates the pronunciation dictionary to reflect the changes to the application or users.
Manually updating a pronunciation dictionary to reflect changes to an application or users has several significant drawbacks. First, it relies upon problems being reported to the administrator of the speech recognition system. Problems may exist for long periods of time before being reported. In some situations this can adversely affect the reputation of the enterprise using the speech recognition system.
Furthermore, even after the problems are identified, a significant amount of human resources and may be required to update the pronunciation dictionary, further extending the problem. For example, updating the pronunciation dictionary typically involves collecting a large amount of actual pronunciation data for the problem utterances. The actual pronunciation data is then processed and used to update the expected pronunciation data contained in the pronunciation dictionary. Meanwhile, the speech recognition system is unable to recognize the problem utterances until the system is updated, which can be very frustrating to customers and other users of the system.
Based on the foregoing, there is a need for an automated approach for determining the accuracy of a pronunciation dictionary in a speech recognition system.
There is a particular need for an automated approach for determining the accuracy of a pronunciation dictionary in a speech recognition system that identifies particular expected pronunciation representations that do not satisfy specified accuracy criteria and therefore need to be updated.
There is a further particular need for an automated approach for determining the accuracy of a pronunciation dictionary in a speech recognition system that requires a reduced amount of human resources in the identification process.
The foregoing needs, and other needs and objects that will become apparent from the following description, are achieved by the present invention, which comprises, in one aspect, a method for determining the accuracy of a pronunciation dictionary in a speech recognition system. According to the method, an expected pronunciation representation for a particular utterance is retrieved from the pronunciation dictionary. Then, an accuracy score is generated for the expected pronunciation representation by comparing the expected pronunciation representation to a set of one or more actual pronunciations of the particular utterance.
According to another aspect, a method is provided for automatically updating a pronunciation dictionary in a speech recognition system to reflect one or more changes to an actual pronunciation of a particular word that is represented in the pronunciation dictionary. According to the method, an expected pronunciation representation for the particular word is retrieved from the pronunciation dictionary. An accuracy score is generated for the expected pronunciation representation by comparing the expected pronunciation representation to one or more actual pronunciations of the particular word. A determination is made whether the accuracy score for the expected pronunciation representation satisfies specified accuracy criteria. If the accuracy score for the expected pronunciation representation does not satisfy the specified accuracy criteria, then the expected pronunciation representation is updated to reflect the one or more actual pronunciations.
According to another aspect, a speech recognition apparatus is provided. The speech recognition apparatus comprises a storage medium having a pronunciation dictionary stored thereon and a diagnostic mechanism communicatively coupled to the storage medium. The diagnostic mechanism is configured to retrieve an expected pronunciation representation for a particular utterance from the pronunciation dictionary. The diagnostic mechanism is further configured to generate an accuracy score for the expected pronunciation representation by comparing the expected pronunciation representation to a set of one or more actual pronunciations of the particular utterance.