Unless otherwise indicated herein, the materials described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Speech-based user interfaces (SUIs) allow computing devices to interact with users through speech. For example, an SUI may employ various speech processing technologies such as automatic speech recognition (ASR) to receive information or instructions spoken by a user. Further, the SUI may employ speech synthesis technologies such as text-to-speech (TTS) to provide information to the user in the form of computer-generated speech.
SUIs may facilitate various modes of human-machine interaction including a hands-free mode of operation, a multi-modal mode of operation, and an accessibility mode of operation, among other possibilities. As an example of hands-free operation, a navigation application in a computing device may provide driving directions to a driver through computer-generated speech. As an example of multi-modal operation, a test-taking application in the computing device may provide visual instructions to a test-taker along with speech prompts for time remaining in the test. As an example of accessibility operation, an operating system or a screen-reader application may recite or describe contents of a display of the device to a visually impaired user or any other user operating the device in the accessibility mode.
To facilitate speech synthesis, a computing device typically accesses a corpus of recorded speech from a speaker that has a particular voice (e.g., male, female, child, adult, high-pitch, low-pitch, etc.). Alternatively, the corpus may include representations of the recorded speech (e.g., acoustic feature parameters, etc.).
Typically, a computing device may employ TTS as a system service available to at least some applications in the computing device. By way of example, an application in the device may provide text to the TTS system. In turn, the TTS system may generate synthetic speech by concatenating one or more recorded speech sounds to recite the text. Alternatively, for instance, the TTS system may generate the synthetic speech for the text by modulating signals to a speaker of the device according to stored acoustic feature parameters.