(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and apparatus for automatically analyzing a speech dictated document.
2. Description of the Related Art
Speech recognition, also referred to as speech-to-text, is technology that enables a computer to transcribe spoken words into computer recognized text equivalents. Speech recognition is the process of converting an acoustic signal, captured by a transducive element, such as a microphone or a telephone, to a set of words. These words can be used for controlling computer functions, data entry, and word processing. Accomplishing accurate speech recognition is a formidable task due to the wide variety of pronunciations, individual accents and speech characteristics of individual speakers. Consequently, language models are often used to reduce the search space of possible words and to resolve ambiguities as between similar sounding words. Such language models tend to be statistically based systems and can be provided in a variety of forms.
The simplest language model can be specified as a finite state network, where the permissible words following each word are given explicitly. However, more sophisticated language language models have also been developed which are specified in terms of a context specified grammar. The most widely used statistical model of language is the trigram model. In the trigram model, a word is predicated based solely upon the two words which immediately precede it. As a result of its effectiveness and simplicity, the trigram model has been the workhorse of the statistical approach to speech recognition for over twenty years.
Since each particular speaker will tend to have their own style of speaking, it is important that the attributes of such speaking style be adapted to the language model. By continuously updating the language model, it is possible to improve the overall accuracy of the speech recognition process for that speaker and thereby permit greater efficiencies. Accordingly, it is desirable to update the language model for each particular speaker on a regular basis. One method requires the speaker to use a speech recognition correction function. However, often speakers choose not to use the correction function to correct speech recognition errors. Rather, speakers either redictate the misrecognized word or phrase, or type their own corrections directly in the document. In consequence, speech recognition systems often do not update the language model with new language model data.
Another method of updating the language model could include providing the speaker with a vocabulary expanding tool for analyzing the document. This analysis uses the document""s text to update the speaker""s language model and, optionally, add new words to the speaker""s vocabulary. However, if the speaker executes the vocabulary expanding tool before completing the editing phase, the use of the tool will corrupt the language model with bad data. Conversely, if the speaker waits until completing the editing phase before invoking the tool, the speaker will have forgone the benefit of accuracy improvements for the duration of the editing phase. Additionally, the repeated execution of the vocabulary expanding tool on the same document will artificially bias the language model towards words contained in that document. Similarly, if the speaker repeatedly executes the vocabulary expanding tool subsequent to the dictation and editing of a single paragraph, the language model will be artificially biased towards the words contained in earlier paragraphs.
At least one existing speech application automatically analyzes speech dictated text in a document. IBM""s MedSpeak(copyright), for instance, includes an automatic analysis of speech dictated text. MedSpeak(copyright), however, is a specialized product which only creates radiology reports. Moreover, in MedSpeak(copyright), the automatic analysis begins only after the user completes the reportxe2x80x94a fixed, definite event in the MedSpeak(copyright) application. In contrast, generalized speech-enabled applications, for instance a word processor, do not have a fixed, definite event denoting the completion of the editing phase of a document. In a word processing application, the speech recognition system cannot ascertain when a document has been finally edited. Moreover, by invoking automatic analysis only at the conclusion of the editing phase of a radiology report, MedSpeak(copyright) does not provide for concurrent speech recognition improvement during the editing phase of the document.
The invention concerns a method and apparatus for automatically analyzing a document in a speech dictation system. The invention as taught herein has advantages over all known methods now used to analyze a document, and provides a novel and nonobvious system, including apparatus and method, for automatically analyzing a document. A method for automatically analyzing a document in a speech dictation system having a vocabulary and language model can comprise the steps of: determining whether the document has undergone previous analysis; undoing the previous analysis; and, analyzing the document. More specifically, the determining step comprises the steps of: comparing trigrams in the document with trigrams in the language mod and, setting a reference point containing document data for undoing a previous analysis in the undoing step if the compared language model contains all the document trigrams. Moreover, the undoing step comprises the step of removing from the language model each trigram contained in the document data in the reference point. Finally, the analyzing step comprises the steps of: searching the document for new words not contained in the vocabulary; adding the new words to the vocabulary; updating the language model with trigrams contained in the document; and, setting a reference point containing document data for undoing the updating of the language model in the updating step. The adding step comprises: identifying among the new words found in the searching step, correctly-spelled words not requiring a speaker-supplied pronunciation; and, adding the correctly-spelled words to the vocabulary.
The inventive method can further comprise the steps of: recognizing a request from a user of the speech dictation system to discontinue user editing; finding in the document new words not contained in the vocabulary in response to the request; enhancing the vocabulary with the new words; undoing the analysis created during the analyzing step: and further analyzing the document. More specifically, the enhancing step comprises the steps of: procuring pronunciations for the new words in the document requiring pronunciations; and, adding the new words having procured pronunciations to the vocabulary. Alternatively, the enhancing step comprises the steps of: presenting to the user a list of the new words contained in the document accepting from the user a selection of new words from the list to be added to the vocabulary; automatically procuring pronunciations for each word contained in the selection of new words not requiring speaker-supplied pronunciations; prompting the user for pronunciations for words contained in the selection of new words requiring speaker-supplied pronunciations; and, adding the selection of new words to the vocabulary. Moreover, the presenting step comprises the steps of: creating a user list for holding the new words; including in the user list each out-of-vocabulary new word formed using all capital letters, and each capitalized new word contained in the document having a corresponding identically spelled lowercase version contained in the vocabulary; excluding from the user list each closed class new word; and, presenting the user list to the user in a user interface. Finally, the adding step comprises the steps of: discarding each selected out-of-vocabulary new word formed using all capital letters having a pronunciation procured during the prompting step which matches a pronunciation of an identically spelled lowercase version of the out-of-vocabulary new word contained in the vocabulary; and, adding each remaining selected new word to the vocabulary.