Spelling-to-sound rules are used to build baseforms for phonetically-based languages. A baseform comprises the phonetic spelling of a word in a particular language. As described in a paper entitled: “Automatic Modelling for Adding New Words to a Large Vocabulary Speech Recognition System”, by Asadi A., Schwartz R., and Makhoul J., published in the Proceedings of ICASSP'91, and a paper entitled: “An Advanced System to Generate Pronunciations of Proper Nouns”, by Deshmukh, Neeraj, Ngan J., Hamaker J., and Picone J., published in the Proceedings of ICASSP'97, automated baseform generation/building is of considerable importance in numerous applications.
In speech recognition applications, automated baseform generation is used to build phonetic spellings for user-specific words. To elaborate, in dictation applications, if a user requires words that are not contained in the dictionary of the dictation task, the application provides mechanisms for automatically generating phonetic spellings from the given spelling of a new word or, alternatively, from the speech signal of that word. In speech synthesis applications, automatic baseform generators/builders are used to build pronunciations for words that do not exist in the vocabulary of the synthesis system. This is critical for a speech synthesis system, as the system needs to know the pronunciation of a word before the word can be synthesized.
According to Bahl et al., in “Automatic Phonetic Baseform Determination”, Proceedings of ICASSP'91, for languages that do not have a strong relationship between spelling and pronunciation, attempts have been made to train a system from a large vocabulary whereby spelling-to-sound rules are learnt statistically. Disadvantageously, such systems require a large set of words and corresponding baseforms for training and may not perform well when presented with a sequence of letters substantially different to those used for training purposes. An example is the presentation of proper nouns (e.g. names).
Ramabhadran et al., in “Acoustics-only based Automatic Phonetic Baseform Generation”, Proceedings of ICASSP'98, 309–312, and Holter et al., in “A Comparison of Lexicon-Building Methods for Subword-Based Speech Recognisers”, Proceedings of TENCON '96, mention another approach whereby authors make use of the acoustics of a word whose baseform is to be determined. Disadvantageously, this would render the baseform speaker-dependent and possibly of limited or no use for other speakers.
For non-phonetically based languages, a system can be trained with speech in addition to spelling, thus enabling the system to interpret spelling-to-sound rules with the aid of speech. The use of speech can be avoided for phonetically-based languages. Instead, rule-based techniques for generating baseforms can be used that rely only on the spelling of words and do not require speech input. However, baseform generation that is solely rule-based tends to suffer from associated ambiguities. Such ambiguities are typically manifested as “unruly” phones, which are defined as prevalent pronunciations of phones that do not conform to defined rules for converting spelling to sound, and result in incorrect baseforms being generated.