Intelligent automated assistants (or virtual assistants) provide an intuitive interface between users and electronic devices. These assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can access the services of an electronic device by providing a spoken user input in natural language form to a virtual assistant associated with the electronic device. The virtual assistant can perform natural language processing on the spoken user input to infer the user's intent and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more functions of the electronic device, and a relevant output can be returned to the user in natural language form.
Conventional virtual assistants use pronunciation lexicons to recognize words contained in a spoken user input. These pronunciation lexicons typically include one or more phonetic pronunciations for a predetermined set of commonly used words. Thus, when a spoken user input is received, the virtual assistant can compare the input to the phonetic pronunciations in the pronunciation lexicon to identify the word(s) in the lexicon that most closely match the spoken user input.
While conventional pronunciation lexicons can be used to effectively recognize many of the words encountered by a virtual assistant, it can be difficult to use conventional pronunciation lexicons to identify named entities, such as the name of a person, restaurant, city, movie, or the like. This is because it can be impractical for conventional pronunciation lexicons to include all possible named entities. For example, conventional pronunciation lexicons typically require each pronunciation to be entered manually by a phonetician. Entering all possible named entities in this way can be prohibitively time-consuming. Moreover, even if all named entities were added to conventional pronunciation lexicons, the large number of pronunciations may lead to confusion between similarly pronounced words. As a result, the virtual assistant may often incorrectly recognize words in the user's spoken input.
Thus, improved processes for generating, managing, and using a pronunciation lexicon are desired.