1. Field of the Invention
This invention relates to the field of phonetics. In particular, the invention relates to technologies for creating phonetic variations automatically for given pronunciations of individual words.
2. Description of the Related Art
A. Notation
Before turning to definitions, some notational concerns will be addressed. A standard notational alphabet, the International Phonetic Alphabet (IPA) can be used to represent the pronunciation of words using phonemes. However, the IPA uses symbols that are difficult to represent easily in ASCII systems and further many of the symbols lack appropriate representational glyphs in standard computer fonts. (Newer systems that handle Unicode can represent IPA symbols directly and frequently include newer fonts with appropriate glyphs for IPA symbols.) Accordingly, it is more convenient and has become industry standard practice to use the Computer Phonetic Alphabet (CPA) in computer speech recognition and pronunciation generation tools such as “autopron”, from Nuance Communications, Menlo Park, Calif. and “namepro”, from E-Speech Corporation, Princeton, N.J.
The CPA has the advantage that it can be represented using standard ASCII characters using the glyphs in commonly available fonts. The following table shows the correspondence between CPA and IPA symbols for American English.
TABLE 1American English: Computer Phonetic Alphabet (CPA) toInternational Phonetic Alphabet (IPA) CorrespondenceCPAExampleIPACPAIPACPAIPAVowelsStopsFricativesifleetippffIdimpleIttTΘedateeIkkssEbetεbbS∫acatæddvvajsideaIggD Ojtoy IFlapszz{circumflex over ( )}cutΛ!!Z3ublueuNasalshhUbook mmApproximantsoshowo nnjjOcaught g~ r Afather, cotαAffricateswwawcoucha tSt∫ll*rbird dZd3*alive∂Throughout the remainder of this document, the CPA symbols will be used to represent phonemes in transcriptions.
The range of possible sounds that a human being can produce by moving the lips, tongue, and other speech organs, are called phones. These sounds are generally grouped into logically related groups, each a phoneme. In a given language only certain sounds are distinguished (or distinguishable) by speakers of the language, i.e. they conceptualize them as different sounds. These distinguishable sounds are phonemes. In fact, a phoneme may be defined as a group of related phones that are regarded the same by speakers. The different sounds that are part of the same phoneme are called allophones (or allophonic variants).
Returning to notation issues, the phonemic transcription of a word will be shown between slashes (“/ /”). For clarity, the glyph “.” will be placed between each phoneme in the transcription, e.g. /k•O•r•n•*r/ for “corner”, to represent the space character visibly. In many computer programs a space character is used to represent the boundary between phonemes; however, in a printed publication using the standard glyph for the space character, “ ”, might lead to ambiguities, e.g. between /*r/ and /*.r/, etc.
If used, phonetic transcriptions will be shown in brackets (“[ ]”). Phonetic transcriptions distinguish between the different phones that are allophones of the phoneme.
B. Role of Phonemic Transcriptions in Speech Software
Speech recognizers (both speaker independent and speaker dependent varieties) rely on pronunciations to perform recognition. For example, in order for the Nuance(™) speech recognition software from Nuance Communications, to recognize a word in a recognition grammar, a pronunciation (e.g. phonemic transcription) must be available. To support recognition, Nuance provides a large phonemic dictionary that includes pronunciations for many American English words. The content of the dictionary typically excludes proper nouns and made up words, e.g. “Kodak”; however, there may be extensions for particular purposes, e.g. for US equity issues (stocks).
Additionally, Nuance provides an automated tool, “autopron”, that attempts to generate (simply from the spelling of the word) a usable pronunciation. Other companies, e.g. E-Speech, specialize in providing software that they claim can do a better job at generating such pronunciations.
Symmetrically, a good pronunciation is also important to producing good synthesized speech (or in the case where a human is reading a script, providing the human with extra guidance about the correct pronunciation). Thus, a useful phonemic transcription is important to many aspects of computer speech technology.
C. Conclusion
Prior techniques for generating pronunciations automatically result in transcriptions that do not correspond well with the actual pronunciations used by native speakers. Further, the tools sometimes systematically generate an unwieldy number of transcriptions, e.g. dozens of possibilities for simple words, thus a correct transcription is produced as much by accident as by any systematic plan. Further, if such an unwieldy set of transcriptions were used in a large recognition grammar (e.g. when thousands of words may be recognized simultaneously) a large number of mistakes in recognition will result. Put differently, the number of bad pronunciations outweigh the correct ones and result in too much confusion for high accuracy.
In contrast, other prior techniques generate a single phonemic transcription for a given word. These transcriptions however do not necessarily match up well with common pronunciations, nor do they allow for common phonemic variations among native speakers.
Similarly, prior techniques generate phonotactically impossible transcriptions with surprising frequency.
Accordingly, what is needed is a method and apparatus for refining a computer generated phonemic transcription using one or more well defined rules to produce more accurate transcriptions as well as likely phonemic variations; the method and apparatus should also prevent (or at least identify) phonotactically impossible co-occurrences.