Historically, a computer can provide the ability to convert text passages to an audio output for a user. Typically, a user sitting at a computer requests the conversion of text to an audio output (e.g. text to speech). Then the computer executes text-to-speech (TTS) software that converts the text to the audio output, which the computer then plays through a speaker for the user to hear. The user may be an individual who is visually impaired who uses the TTS software to hear text displayed on the computer screen, a user accessing a computer system from an audio communication device (such as a telephone), or a user of a computer who prefers to hear speech output rather than reading text on the computer's visual display.
In one conventional approach to TTS conversion, the user of a client computer or telephone may request the conversion of text to speech over a remote or network connection to a remote computer (e.g. server) that is executing the TTS software. For example, if the user is using a telephone, the user may make a request for a stock report from the remote computer, which accesses the text for the stock report from a database and converts the text to audio-based output. The remote computer then sends the audio-based output to the audio telephone to be output through the speaker of the device. In another example, if the user is using a client computer, the TTS software on a remote computer typically converts the body of text to an audio-based output, such as an output file having an audio file format. One commonly used audio output file format is the WAV audio file format for storing sounds as waveforms, which specifies a “.wav” file extension, such as typically used by the Microsoft® Windows® operating system. The server then sends the audio file back to the client computer, which plays the audio file for the user, who hears the file through the speaker of the client computer.
In another example of a conventional approach, the client computer and server computers can be connected through the World Wide Web (WWW), which provides communication over a network using the Internet Protocol (IP) and transmits requests over the network based on the hypertext transport protocol (HTTP). Users sitting at a client computer can thus make HTTP requests to a server located on the web, which provides information and services to the user at the client computer. Typically, the user invokes a web browser at the client computer and makes the request to the web browser, which in turn makes the HTTP request over the WWW to a server to fulfill the request. Thus, the user can initiate an HTTP request to hear textual information over the WWW to a server that includes TTS software. The server receives the HTTP request and executes TTS software on the server to convert the textual information to an audio format, such as a WAV file. The TTS server returns the audio output file to the client computer, which then plays the audio output file through the client's speaker for the user.
One example of TTS software is the Festival Speech Synthesis System, which is a TTS application that can execute on a server to provide text-to-speech conversion. The Festival Speech Synthesis System is available from the Centre for Speech Technology Research (CSTR), University of Edinburgh, Edinburgh, United Kingdom.