Name pronunciation may be used in the area of field service within the telephone and computer industries. It is also found within larger corporations having reverse directory assistance (number to name) as well as in text-messaging systems where the last name field is a common entity.
There are many devices commercially available which synthesize American English speech by computer. One of the functions sought for speech synthesis which presents special problems is the pronunciation of an unlimited number of ethnically diverse surnames. Due to the extremely large number of different surnames in an ethnically diverse country such as the United States, the pronouncing of a surname cannot be practically implemented at present by use of other voice output technologies such as audiotape or digitized stored voice.
There is typically an inverse relation between the pronunciation accuracy of a speech synthesizer in its source language and the pronunciation accuracy of the same synthesizer in a second language. The United States is an ethnically heterogeneous and diverse country with names deriving from languages which range from the common Indo-European ones such as French, Italian, Polish, Spanish, German, Irish, etc. to more exotic ones such as Japanese, Armenian, Chinese, Arabic, and Vietnamese. The pronunciation of surnames from the various ethnic groups does not conform to the rules of standard American English. For example, most Germanic names are stressed on the first syllable, whereas Japanese and Spanish names tend to have penultimate stress, and French names, final stress. Similarly, the orthographic sequence CH is pronounced [c]; in English names (e.g. CHILDERS), [s] in French names such as CHARPENTIER, and [k] in Italian names such as BRONCHETTI. Human speakers often provide correct pronunciation by "knowing" the language of origin of the name. The problem faced by a voice synthesizer is speaking these names using the correct pronunciation, but since computers do not "know" the ethnic origin of the name, that pronunciation is often incorrect.
A system has been proposed in the prior art in which a name is first matched against a number of entries in a dictionary which contains the most common names from a number of different language groups. Each dictionary entry contains an orthographic form and a phonetic equivalent. If a match occurs, the phonetic equivalent is sent to a synthesizer which turns it into an audible pronunciation for that name.
When the name is not found in the dictionary, the proposed system used a statistical trigram model. This trigram analysis involved estimating a probability that each three letter sequence (or trigram) in a name is associated with an etymology. When the program saw a new word, a statistical formula was applied in order to estimate for each etymology a probability based on each of the three letter sequences (trigrams) in the word.
The problem with this approach is the accuracy of the trigram analysis. This is because the trigram analysis computes only a probability, and with all language groups being considered as a possible candidate for the language group of origin of a word, the accuracy of the selection of the language group of origin of the word is not as high as when there are fewer possible candidates.