The task of speech recognition entails the automated identification of words which have been spoken by an individual, typically in order to enable an automated system to take certain (automated) actions in response thereto (i.e., to control the system by voice input). In particular, the problem of speaker-independent (as opposed to speaker-dependent) speech recognition requires that the speech of any one of a large population of possible speakers (preferably, all speakers who speak in the language or languages which are supported by the automated system) can be recognized, so that the system's resultant functions can be controlled by all possible users of the system. In certain applications of speech recognition, proper nouns, such as personal names, or the derivatives of personal names which include geographical names (such as, for example, names of countries, cities, towns and streets), trade names, and the like, occur frequently, and may in fact comprise the essence of the speech which needs to be recognized. One such application, for example, which has been widely touted and implemented in various forms, is an automated telephone name dialing capability, in which a speaker requests a telecommunications network to complete a telephone call to a given person by speaking his or her name, rather than by dialing a telephone number.
Unfortunately, the pronunciation of proper names has been one of the most challenging problems in the development of language and speech applications (such as speech recognition). Whereas most common words in a given language (i.e., "natural language" or "dictionary" words) have a fairly limited set of possible phonologically distinct pronunciations--in fact, often only one or two--proper nouns may have a substantial number of "acceptable" (phonologically distinct) pronunciations. (As is well known in the art, phonologically distinct pronunciations are fundamentally different pronunciations, as opposed to, for example, phonetically distinct pronunciations which include the normal slight variations that even a single person might produce with repeated utterances of the same word.) In addition, some of these acceptable pronunciations may be quite inconsistent with the pronunciation "rules" of the language being spoken (e.g., English), which is often the result of the name being of a "foreign" origin (i.e., a language origin different than the language being spoken). Moreover, the "acceptability" of some of these various pronunciations may depend on the particular context in which the name is being spoken, such as, for example, based on a given speaker population or in a given environment. For example, in the case of a person's name, which is of foreign (e.g., non-English) origin, such acceptable pronunciations may vary based on the speaker population, ranging from the recognition of speech from close associates of the given individual such as intimate friends, who are likely to be quite familiar with the "correct" pronunciation (as used by the bearer) of the name at one end of the spectrum, to the recognition of speech by remote associates such as, for example, American telemarketers making unsolicited phone calls to the person at the other end of the spectrum.
In other words, different people will often pronounce the same name in different ways, and a robust speech recognition system should be capable of recognizing any such "reasonable" pronunciation. Note that while some of these variations in pronunciation may be due to phenomena such as regional differences between speakers, most are the result of a combination of familiarity by the speaker with the national origin of the name, and the letter to sound rules associated with a set of relevant languages(s)--both the language of origin for the name and the language(s) which are familiar to the speaker. For example, a Chinese person will typically pronounce a Chinese person's name according to the Pinyin rules (familiar to those skilled in the art) or according to another accepted Romanization method, while an American is likely to use American English rules despite the Chinese origin of the name. As such, the Chinese name Qiru would be most likely pronounced as [{character pullout}i-ru] by a Chinese friend, but an American, unaware of the Romanization system used, might pronounce the name as [k.alpha.{character pullout}-ru] or [ki-ru] instead.
In addition, there are factors other than familiarity with the ethnic origin of the name which also effect pronunciation. That is, users from different ethnic backgrounds often pronounce the "same" name differently. Moreover, foreign names are frequently Anglicized differently, even by people of the same ethnic background. For example, either [.intg.'we] or [.intg.'u] may be used for the Chinese name "Hsueh." (The native pronunciation is actually [.intg.u'e].) In addition, old names that are employed by various cultures often end up being pronounced differently as well. For example, the name "Epstein," which originates from 14.sup.th -century Bavaria, became a popular Jewish and German name, resulting in the pronunciations ['epstin] and ['epsta{character pullout}n], respectively. And finally, certain mispronunciations (i.e., pronunciations for which there is no "legitimate" basis) may be so common in practice that they also need to be recognized. (See, for example, the discussion of the Chinese name "Quan" below.
In the case of names of Chinese origin, an additional complication arises due to the various Romanization systems. The name having the native pronunciation in Mandarin [{character pullout}uen], for example, may be Romanized as either "Quan"--leading to the common mispronunciation [kwan]--or it may be Romanized as "Chuan"--leading to the pronunciation [{character pullout}wan]. In addition, a dialectical variant of the same name from Cantonese is "Chen" having the native pronunciation [{character pullout}{character pullout}n]. Indeed, the name may be (not unreasonably) rendered by its bearer as [{character pullout}uen], [{character pullout}u.alpha.n], [{character pullout}w.alpha.n], [{character pullout}{character pullout}n], [{character pullout}{character pullout}n ], [kwan], and [kw.ae butted.n], et alia.
Various approaches have been employed in the past to attempt to recognize speech containing proper names. Certain prior art name pronunciation systems for use in speech recognition, for example, employ a table lookup method based on annotated name databases. (See, e.g., U.S. Pat. No. 5,752,230, issued on May 12, 1998 to T. G. Alonso-Cedo, "Method and Apparatus for Identifying Names with a Speech Recognition Program.") However, such an approach is not capable of generating pronunciations of relatively rare names, since they are not likely to be included in the database. And unfortunately, the majority of names which are actually encountered, are, in fact, relatively rare, making such "dictionary" based solutions infeasible. (Note that the distribution of names obeys Ziph's Law, familiar to those of ordinary skill in the art. In particular, the most frequent names cover a sizable percentage of the population, but the coverage decreases rapidly. For example, the most popular American name, which is "Smith", covers 1% of the data based on the 1990 census, while the 30th most popular name, which is "King", covers 0.1% of the data. Since rare names are, in fact, very common, it is quite difficult to obtain adequate coverage with a dictionary based approach with alternative pronunciations.)
The problem of proper name pronunciation has also been addressed in the context of text-to-speech applications, where the goal is to generate, rather than to recognize, speech. In these applications, however, it is typically adequate to merely produce one single most likely (or most accurate) pronunciation of a given name. In some cases, these systems have advantageously incorporated a subprocess for determining the language origin of the name, in order to choose a pronunciation which is more likely to be an accurate one. (See, e.g., U.S. Pat. No. 4,829,580 issued on May 9, 1989 to K. W. Church, "Text Analysis System with Letter Sequence Recognition and Speech Stress Assignment Arrangement," and U.S. Pat. No. 5,040,218 issued on Aug. 13, 1991 to A. J. Vitale et al., "Name Pronunciation by Synthesizer." U.S. Pat. No. 4,829,580 to K. W. Church, which is assigned to the assignee of the present invention, is hereby incorporated by reference as if fully set forth herein.) By their nature, however, such text-to-speech systems fail to produce multiple "plausible" pronunciations of the given name, which as pointed out above, is a clear requirement for the implementation of a robust speech recognition system.
As such, the prior art approaches fail to adequately solve the speaker-independent speech recognition problem for applications in which personal names or the derivatives of personal names (such as geographical names) occur frequently. An alternative approach is required--one which can identify multiple, but nonetheless plausible, pronunciations of a given personal name, and which can furthermore adapt the set of such "acceptable" pronunciations to the particular speaker population of interest.