Grapheme-to-phoneme (G2P) conversion is directed towards automatically generating pronunciations (or phoneme sequences) from word spellings (or grapheme sequences). Grapheme-to-phoneme is a widely used component in large-scale voice-dialing systems.
Many grapheme-to-phoneme systems are based on statistical models, which are trained using hand-authored pronunciation lexicons. However, the grapheme-phoneme relationships that occur in pronunciation lexicons, which are usually authored by linguists, often do not reflect how people pronounce words in practice, or may not cover enough variations. This makes such a grapheme-to-phoneme model less than ideal for speech-related tasks.
By way of example, consider recognizing names. One challenge is referred to as domain mismatch; some grapheme-phoneme relationships that occur in names may be lacking from a pronunciation lexicon. Although some names and their pronunciations may be added into the lexicon, it is unrealistic to do so at a large scale, as there may be an enormous number of unique names, and it is often the rare names that have irregular pronunciations.
Another challenge is speaker variability. People from different geographic regions and ethnic groups may pronounce the same name in different ways. A hand-authored pronunciation lexicon cannot reasonably capture such variations.