Intelligent automated assistants (or digital assistants) can provide a beneficial interface between human users and electronic devices. Such assistants can allow users to interact with devices or systems using natural language in spoken and/or text forms. For example, a user can provide a speech input containing a user request to a digital assistant operating on an electronic device. The digital assistant can interpret the user's intent from the speech input and operationalize the user's intent into tasks. The tasks can then be performed by executing one or more services of the electronic device, and a relevant output responsive to the user request can be returned to the user.
Digital assistants interpret speech input by initially converting the speech input into text using automatic speech recognition (ASR) techniques. During speech-to-text conversion, an automatic speech recognizer generates a spoken-form text representation of the speech input. The spoken-form text representation can be unformatted and difficult to read. For instance, “what is twenty percent of thirty two dollars and forty cents” is an exemplary spoken-form text representation of a corresponding speech input. Spoken-form text representations are subsequently converted into written-form text representations using a process known as inverse text normalization (ITN). During ITN, certain entities in the spoken-form text representations (e.g., entities such as cardinals, ordinals, dates, times, and addresses) are re-formatted into written-form according to the customs of a particular locale. For example, “what is 20% of $32.40” is an exemplary written-form text representation converted from the spoken-form text representation “what is twenty percent of thirty two dollars and forty cents.” Unlike spoken-form text representations, the format of written-form text representations can be suitable for presentation to users and for processing by downstream components, such as natural language processing components.