The present invention relates generally to speech processing, and specifically, to methods and systems for transcribing orthographies into corresponding phonemic transcriptions.
Phonemes are the units of sound in a language that distinguish one word from another. The word "cat," for instance, contains three phonemes. Unfortunately, there is often no direct correspondence between the normal spelling of a word, called the word's orthography, and the spoken sounds we actually produce. One sound can be represented by a number of different letters or combinations of letters. For example, the first sound in the words "cat," "kick," "quick," and "chemistry" is the same. On the other hand, the initial letter "c" in "cat" and "circus" represent different sounds.
Because of the problems posed by English spelling, and spelling in other languages, phonemic alphabets have been used to represent words in which each symbol in the phonemic alphabet corresponds to one sound. So, for example, the initial sound in "cat" and "kick" may be represented by the symbols /k/, while the one in "circus" may be represented by the symbol /s/. Throughout this disclosure, a phonemic alphabet of 40 symbols is used, although other phonemic alphabets could equivalently be used. Further, backslashes will be used, when necessary, to distinguish a symbol as a phonemic one.
A "phonemic transcription" encodes the sound patterns of a word using the phonemic alphabet. In addition to symbols from the phonemic alphabet, phonemic transcriptions may additionally include information relating to word stress and syllabification. For example, the orthography "communications" is phonemically transcribed as /k*-mju=n*-ke=S*nz/ [0-2-0-1-0], where the symbols {k,*, m, j,u,e,S,n,z} are phonemes, {-, =} are syllable markers, and {0,1,2} are stress indicators (1=primary stress, 2=secondary stress, 0=unstressed).
Phonemic transcription dictionaries are useful in a number of areas of speech processing, such as in speech recognition. These dictionaries typically contain a collection of orthographies, their corresponding phonemic transcriptions, and optionally, stress and syllabification information.
Conventional phonemic transcription dictionaries have been created manually using a human expert or automatically using a computer. Manual transcription of orthographies is laborious and produces inconsistencies among different transcribers. Conventional automatic transcription techniques, on the other hand, although faster and more consistent, still have a relatively high error rate and often do no better than a list of possible transcriptions that must then be refined by a human.
There is, therefore, a need to improve automatic transcription techniques.