It is known practice to convert information which is available in the form of texts or numerical values into an audio signal for output in the form of speech. This is necessary, by way of example, if there is no possible way of outputting data visually, that is to say a screen or display is not available, but rather only a loudspeaker or a simple telephone. Speech output of numerical values is used by directory inquiries, for example, in order to notify the customer of the desired telephone number. To this end, the information which is to be output is first of all divided into information units which are as small as possible. Using a table or an algorithm, each in formation unit is converted into a syllable or into a sequence of syllables. The compiled succession of syllables is converted into an audio signal by an output unit. In the directory inquiries example, the speech signal produced is transmitted to the caller via the telephone network.
Another known example of the conversion of text into speech is the “e-mail to speech” function with which services for distributing electronic mail, “e-mail systems”, are often equipped. In this case, an e-mail may be “read” from any telephone line by virtue of the user calling a specific line for the e-mail server. Following authentication, a service provided in the e-mail system converts the text of the e-mail into speech and “reads” the content of the e-mail aloud to the user. If the conversion involves analyzing the syntax—which is also referred to as “parsing”—then the service for creating the succession of syllables is also called a “parser”. Unlike the service which is used for directory inquiries, the parser used in the e-mail system is more complex, because not just single digits are converted, but rather the full vocabulary of a natural language, and usually an intonation is produced in addition. In this case, the parsers often draw a distinction between different natural languages, that is to say have a different “pronunciation” for German, English, French etc. and may furthermore often even use either male or female speech patterns (speech samples) for output or else speech patterns recorded by the user himself beforehand. For the purpose of speech output, frequently also referred to as speech “synthesis”, use is normally made of units which are equipped with specially programmed DSPs (DSP=Digital Signal Processor) or other components produced specifically for this purpose.
U.S. Pat. No. 6,263,051 B1 “System And Method For Speech Service Bureau” discloses a central service in a communication network for handling telephone calls automatically. In this case, the central service receives all of the relevant data for a telephone call which is to be held via a data interface, for example an XML interface. This central service then uses a communication installation to set up a voice connection to a telephone subscriber, and uses a speech output device (“T-T-S=Text-to-Speech Engine”) to handle interactive communication with the telephone subscriber. Depending on the telephone number dialed, the central service is connected either to landline telephones or mobile telephones (and hence natural communicating parties) or else voice mail systems, telephone answering machines or similar machines which can be connected to a telephone line. In this case, the known method involves speech output always being effected in the form of an audio signal via a telephone line. The central service may be supplied with the data required for the next call via various paths, and therefore via different data interfaces. In this context, the known arrangement represents a “virtual call center”, so to speak, which calls a customer for a particular use, for example for telephone banking, provides him with information by means of speech output and stores his keystrokes on the telephone.
The printed document WO 01/57851—Freeland et al., “Speech System”, shows an arrangement for outputting a text message in the form of speech. The arrangement shown in this case allows speech output using a voice which is an approximation of the voice of a selected natural person. In this context, the voice profiles of known personalities are preferably provided for selection. For the purpose of speech output, a communication network is then used to transmit a text message to a centrally arranged service which converts the text message into synthetic speech and then outputs the speech on another, previously specified terminal in the communication network.
In the usual systems for speech output of text-based information, it has been found to be a drawback that each unit or each service needs to reserve its own respective devices, that is to say software and hardware, as special applications for speech output of text-based information for the purpose of synthesizing and outputting the speech data.