Intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices. Such assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can provide a speech input containing a user request to a digital assistant operating on an electronic device. The digital assistant can interpret the user's intent from the speech input and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more services of the electronic device, and a relevant output responsive to the user request can be returned to the user.
To provide a speech output to the user, text-to-speech conversion can be performed. Before the text-to-speech conversion, the digital assistant can normalize text to transform non-standard words to contextually appropriate ordinary words or sequence of ordinary words for pronunciation. For example, depending on the context, a symbol “-” may be transformed to “to,” “dash,” “minus,” or “through” for pronunciation. To perform text normalization, pattern matching techniques may be used. However, the accuracy of text normalization using pattern matching techniques may not be satisfactory and may be error prone under especially when the text to-be-normalized contains different types of non-standard words used in various contexts. Moreover, a typical rule-based process of generating pronunciation of the normalized text may require several steps including phoneme generation, syllabification, and stress placement. This process may be cumbersome, slow, and inaccurate.