1. Technical Field
This invention relates to the field of speech-driven applications, and more particularly, to developing speech driven applications.
2. Description of the Related Art
In a speech recognition system, a component known as a speech recognition engine or recognizer can produce text words from a received audio stream representing user speech. Pronunciations, known as baseforms, stored within the speech recognizer can be used to convert received speech to text. For example, the word spelled “the” can have multiple pronunciations. One phonetic representation can be “TH EE”, while another phonetic representation can be “TH UH”. Simply stated, the recognizer can process an audio stream and generate a phonetic representation of the speech. If a word exists in the speech recognition system vocabulary which matches the phonetic representation, the text word can be returned by the recognizer.
Presently, only speech that is defined within the recognizer (i.e. pronunciations with corresponding text words) can be recognized. Accordingly, the development of a comprehensive speech application can be problematic due to the tremendous number of words which exist in various languages as well as the rate at which new words are invented. Further complicating the problem, speech applications frequently are called upon to recognize “pseudo-words” (such as user-ids and/or passwords). Defining all possible words and pseudo words is not only impractical, but is cost prohibitive as well.
Still, to develop comprehensive speech applications, the application developer must develop pronunciations. These pronunciations can be used with speech recognition and text-to-speech systems. To date, however, pronunciations have been generated by developers using a phonology listing and a text editor. Specifically, the developer types the spelling of a word, types the phonological representation of the spelling, and then compiles the text and phonological representation to build a vocabulary. The vocabulary then must be tested, refined, and recompiled until an acceptable model is developed.