1. Field of Invention
This invention relates to the determination and use of prosodic information.
2. Description of Related Art
Conventional automatic speech recognition systems compare incoming speech signal information against templates of speech signal information. That is, these conventional systems match the signal information of natural language speech against phoneme, word and phrase based signal information templates. Some conventional automatic speech recognition systems constrain this matching process based on probability models such as co-occurrence, lattice rescoring and the like. Idiosyncratic variations in the input speech information are handled by refinement or personalization of the information associated with the signal information templates.
These conventional automatic speech recognition systems typically operate in either a dictation mode or a command mode. In the dictation mode, the input signal information is matched against signal information templates associated with candidate recognized text. The recognized text then serves as the input to the underlying application. For example, recognized text may be placed into an application such as an editor, word-processor, email editor and the like in lieu of, or in addition to, keyboard input. Since the natural language information in a dictation mode can relate to any subject, these conventional natural language processing systems do not typically exploit information about the domain contained in the speech information.
In a conventional command mode, a language model is determined for the automatic speech recognition system based on the target application for the speech. That is, if an operating system is the target of the speech utterance, the set of valid operating system commands forms a set of signal information templates against which the speech utterance signal information is compared. The use of discrete input modes increases the accuracy and/or responsiveness of conventional natural language processing systems. However, the use of discrete input modes can impede the fluency with which a user interacts with the natural language interface. Thus, rather than directly conversing with systems incorporating these conventional natural language interfaces, users are forced to track the current input mode and/or status of the system. Attempts to automatically determine mode changes between sentences, paragraphs and within sentences has not been very successful.