The present invention relates to speech recognition systems, and more particularly, to dictation systems.
One of the most difficult problems for speech recognition systems is how to choose the correct alternative from a list of words that are pronounced similarly or identically, but spelled differently. Such similar sounding words are known as homophones. For example, in English, xe2x80x9cread, redxe2x80x9d or xe2x80x9csend, sentxe2x80x9d are such homophones, as are the French words xe2x80x9cparle, parles, parlentxe2x80x9d. Humans select the correct homophone by considering the context in which a given word appears, or by understanding the content of the text. This technique is, however, not yet feasible for computer systems.
In current computer-based speech recognition systems, one alternative is selected. If that alternative is incorrect, the user may then go and correct the selection, for example, by choosing another alternative from a list of similar-sounding words. This method has the disadvantage that the user must spot the mistake in the recognized text, and then correct it. This takes extra time, breaks the flow of dictation, and carries the risk that some errors may be overlooked.
In accordance with a preferred embodiment of the invention, a method is provided for utilizing a speech recognizer to distinguish a provided utterance from one or more similar-sounding utterances based on a speaker-specified hint. The method includes identifying a hint and associating it with the provided utterance, using the hint to establish a condition for distinguishing the provided utterance from the one or more similar-sounding utterances, and selecting a recognition result that satisfies the condition. The recognition result is derived in conjunction with the operation of the speech recognizer.
In accordance with further embodiments of the invention, selecting a recognition result includes providing a list of alternative recognition possibilities and filtering the list based upon the condition. The speech recognizer may be used to provide entries in the list of alternative recognition possibilities. Alternatively, or in addition, a dictionary may be used to provide entries in the list. The hint may be a linguistic property characterizing the provided utterance or, in the same or an alternative embodiment, the hint may make reference to the context of previous dictation to characterize the provided utterance. Where the hint is a linguistic property, it may be an orthographic, morphological, or semantic property of the provided utterance (As used in this description and in the following claims, the term xe2x80x9cfractional spellingxe2x80x9d means providing the spelling for a portion of the word, wherein the portion need not, but might possibly, include the beginning of the word, and does not include the whole word).
In accordance with a still further embodiment of the invention, the hint may provide some other desired criterion for selecting the provided utterance. The utterance may be a word, or alternatively, a phrase. For the purposes of this description and the following claims, a xe2x80x9chintxe2x80x9d excludes a complete spelling of a word or phrase. The speech recognizer may also use a plurality of hints. In accordance with this embodiment, a method is provided for utilizing a speech recognizer to distinguish a provided utterance from one or more similar-sounding utterances. The method includes identifying a plurality of hints and associating them with the provided utterance, using the hints to establish conditions for distinguishing the provided utterance, and selecting a recognition result that satisfies the conditions. The recognition result is derived in conjunction with the operation of the speech recognizer.
In accordance with another aspect of the present invention, an improved speech recognition system is taught for distinguishing a provided utterance from one or more similar-sounding utterances when a speaker-identified hint is provided. The improved speech recognition system provides a text output in response to a spoken input. The system includes a hint recognizer for identifying the speaker-identified hint and associating it with the provided utterance. The system also includes a condition specifier, coupled to the hint recognizer, which uses the hint to establish a condition for distinguishing the provided utterance. The system further includes a result selector, coupled to the condition specifier, which selects a recognition result that satisfies the condition.
In accordance with a further embodiment of the invention, the result selector includes a filter operative on a list of alternative recognition possibilities. In a still further embodiment, the improved speech recognition system also includes a dictionary, coupled to the result selector, to provide entries in the list of alternative recognition possibilities. As previously mentioned, the hint may be a linguistic (e.g., orthographic, morphological, or semantic) property characterizing the provided utterance, or may make reference to the context of previous dictation to characterize the provided utterance, or may provide some other desired criterion for selecting the provided utterance.
In accordance with yet a further embodiment of the invention, a system is taught for utilizing a plurality of hints to distinguish a provided utterance from one or more similar-sounding utterances. The system includes a hint recognizer for identifying the hints and associating them with the provided utterance. The system also includes a condition specifier, coupled to the hint recognizer, for using the hints to establish conditions for distinguishing the provided utterance. The system further includes a result selector, coupled to the condition specifier, for selecting a recognition result that satisfies the conditions.