Speech recognition systems include various modules and components for receiving speech input from a user, determining what the user said, and determining what the user meant. In some implementations, a speech processing system includes an automatic speech recognition (“ASR”) module that receives audio input of a user utterance and generates one or more likely transcriptions of the utterance. Speech processing systems may also include a natural language understanding (“NLU”) module that receives input, such as a transcription of a user utterance, and determines the meaning of the input in a way that can be acted upon, such as by a computer application. For example, a user of a mobile phone may speak a spoken command to initiate a phone call. Audio of the spoken command can be transcribed by the ASR module, and the NLU module can determine the user's intent (e.g., that the user wants to initiate the phone call feature) from the transcription and initiate the phone call.
Text-to-speech (“TTS”) systems convert text into sound using a process sometimes known as speech synthesis. In a common implementation, a TTS system may receive input, such as text and/or Speech Synthesis Markup Language (“SSM”) data, and provide an audio presentation of the input to a user. For example, a TTS system may be configured to “read” text to a user, such as the text of an email or a list of reminders.
Some systems combine both speech recognition and TTS. For example, global positioning systems (“GPS”) can receive a user's spoken input regarding a particular address, generate directions for travelling to the address, and present the directions aurally to the user. In many cases, users may then continue to interact with such systems while receiving directions. After the GPS system provides the next direction or series of directions, the user may use one of any number of predetermined commands (e.g., “cancel route,” “next turn”). In addition, other non-spoken user interactions may be used to interact with content that is presented aurally. For example, turn-by-turn directions can be displayed via a touch screen display that allows users to select, via a touch screen or keyboard, a particular route to bypass.