(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of speech recognition computer applications and more specifically to a system for evaluating how accurately dictated words are discerned by a speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which acoustic signals, received via a microphone, are xe2x80x9crecognizedxe2x80x9d and converted into words by a computer. These recognized words may then be used in a variety of computer software applications. For example, speech recognition may be used to input data, prepare documents and control the operation of software applications. Speech recognition systems programmed or trained to the diction and inflection of a single person can successfully recognize the vast majority of words spoken by that person.
When it is to be used by a large number of speakers, however, it is very difficult for the speech recognition system to accurately recognize all of the spoken words because of the wide variety of pronunciations, accents and divergent speech characteristics of each individual speaker. Due to these variations, the speech recognition system may not recognize some of the speech and some words may be converted erroneously. This may result in spoken words being converted into different words (xe2x80x9choldxe2x80x9d recognized as xe2x80x9coldxe2x80x9d), improperly conjoined spoken words (xe2x80x9cto thexe2x80x9d recognized as xe2x80x9ctoothxe2x80x9d), and spoken words recognized as homonyms (xe2x80x9cboarxe2x80x9d instead xe2x80x9cborexe2x80x9d).
The erroneous words may also result from improper technique of the speaker. For example, the speaker may be speaking too rapidly or softly, slurring words or located an improper distance from the microphone. In this case, the recognition software will likely generate a large number of mis-recognized words.
Conventional speech recognition systems often include a means for the user to retroactively rectify these errors following the dictation. Typically, this is accomplished by providing a correction xe2x80x9cwindowxe2x80x9d for interfacing with the user. To simplify the correction process, most such correction windows provide a list of suggested or alternate words that in some way resemble the dictated words. This is accomplished by executing an algorithm as is known in the art, one much like a spell checking program in word processing applications, to search a system database for words with similar characteristics as the incorrectly recognized words. The algorithm outputs a list of one or more alternate words from which the user may select the intended word. If the intended words are not in the alternate list, the user may type in the words. After the intended word is selected or keyed in, the algorithm substitutes the corrected word for the erroneous word.
Although the alternate list simplifies the correction process, it does not aid in preventing the occurrence of mis-recognized text. For this, conventional speech recognition systems typically utilize xe2x80x9chelp screensxe2x80x9d or online tutorials that the user may search to find information on a specific query or topic. Although the help files may provide information regarding possible solutions to the mis-recognition, they typically do not provide feedback specific to the speaker or dictation session.
Principally, this is because typical speech recognition systems do not track the frequency in which words are mis-recognized for each dictation session. Thus, it would be desirable to provide a simple correction-tracking system for evaluating the recognition accuracy of speech recognition systems, which can be employed to provide users with specific solutions to mis-recognition problems.
The present invention provides a simple method and system for evaluating the accuracy of a speech recognition system. The invention indexes one or more parameters for each dictation session and uses the parameters to calculate one or more accuracy ratios.
Specifically, the present invention provides a method and system for evaluating how accurately dictated words are recognized during a dictation session by a speech recognition system. The present invention counts the number of dictated words so as to create a total word index and counts the number of mis-recognized words to create a correction index. Then, the correction index is subtracted from the total word index to create a recognition index. An accuracy value is calculated as the ratio of the recognition index to the total word index.
The present invention tracks the total number of words dictated as well as the number of corrections for each dictation session. These values are used to estimate the accuracy of the speech recognition system for a specific dictation session. One object and advantage of the present invention is that the calculated accuracy ratio can be used to initiate problem solving applications or procedures based on the performance of each dictation session.
Another object and advantage of the present invention is that it does not require a great deal of computer memory or processing power. The present invention requires minimal mathematical manipulation and data storage. Simple counting and calculation processes are all that is needed to perform the present invention.
In a preferred embodiment of the present invention, the accuracy of the speech recognition system may also be evaluated according to the number of times corrected words are within a word database as well as the number of times the intended words are suggested in a list of one or more alternate terms. Specifically, each mis-recognized word is compared to at least one alternate word. An alternate index counts each time one of the alternate words is the word intended by the speaker. If the intended word is not within the alternate list, the user inputs one or more corrected words. Then, the number of corrected words not contained in the word database are counted to create an out-of-vocabulary index. The total word index is adjusted if the intended term that was suggested as an alternate or input by the user contained more than one word. In this embodiment, the correction index counts only the number of corrected words in either the alternate list or the word database, and the recognition index is this correction index subtracted from the total word index. The accuracy value is again calculated as the ratio of the recognition index to the total word index.
Thus, an additional object and advantage of the present invention is that it provides an accuracy value according to one or more parameters. This affords a more thorough evaluation of the accuracy of the speech recognition system.
The system and method of the present invention can also sum each respective index for each dictation session and calculate one or more overall accuracy ratios. Thus, the invention provides yet another object and advantage in that it can also approximate the accuracy of the speech recognition system independent of specific users or dictation sessions.