Intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices. Such assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can provide a speech input to a digital assistant associated with the electronic device. The digital assistant can interpret the user's intent from the speech input and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more services of the electronic device and a relevant speech output can be returned to the user in natural language form.
Occasionally, speech outputs generated by digital assistants can contain heteronyms. A heteronym can be each of two or more words that are spelled identically but have different pronunciations and meanings. For example, a user can provide a speech input to a digital assistant requesting the weather in Nice, France. The digital assistant can return a relevant speech output such as, “Here is the weather in Nice, France.” In this example, the speech output contains the heteronym “nice,” which can have one pronunciation as a correct noun and a different pronunciation as an adjective. Conventionally, digital assistants can have difficult disambiguating heteronyms and thus speech outputs containing heteronyms can often be pronounced incorrectly. This can result in a poor user experience in interacting with the digital assistant.